SlideShare a Scribd company logo
1 of 30
Download to read offline
Data Management
  for Librarians:
    An Introduction


      February 19th 2013


       Gareth Knight
          Manager
     RDM Support Service
What is Data?
 “Data are facts, observations or experiences on which an argument, theory or 
test is based. Data may be numerical, descriptive or visual. Data may be raw or 
                   analysed, experimental or observational.“
                http://research.unimelb.edu.au/integrity/conduct/data/review


                    May originate from various sources: 
                        Primary and/or secondary
                        May contain different content:
                        Quantitative and/or qualitative
                     May be expressed in different forms:
 Datasets, still images, audio‐video, audio recordings, interactive resources  
                 May be held in a number of variations:
            Raw, cleaned, anonymised/pseudomised, analysed
                   May be encoded in different formats:
                   MS Excel, TIFF, MPEG2, STATA, FoxPro


           What type of data do you have at home?
Data in the Research Lifecycle

                     Brainstorm




       Finalise &                  Develop 
         submit                    Proposal




       Write‐up 
                                  Plan Project
       Results



                     Perform 
                     Research
Data in the Research Lifecycle

                     Brainstorm




       Finalise &                  Develop       Produce Data 
                                   Develop 
         submit                    Proposal      Management 
                                   Proposal
                                                     Plan




       Write‐up 
                                  Plan Project
       Results



                     Perform 
                     Research
Data in the Research Lifecycle
                        Brainstorm


          Finalise &                 Develop 
           submit                    Proposal




           Write‐up                   Plan 
           Results                   Project

                        Perform 
                         Perform 
                        Research
                        Research

                                      Create / 
           Share                       Reuse




          Describe                    Analyse


                          Store
Data in the Research Lifecycle
      Share                     Brainstorm


                 Finalise & 
                  Finalise &                 Develop 
                    submit
                   submit                    Proposal
       Archive



                  Write‐up                    Plan 
                  Results                    Project

                                Perform 
                                 Perform 
                                Research
                                Research

                                              Create / 
                  Share                        Reuse




                 Describe                     Analyse


                                  Store
What is Data Management?
1.    Plan
      • Determine requirements
      • Identify risks & opportunities
      • Decide approach
2.    Implement
3.    Monitor
     •     Evaluate approach
     •     Change approach/perform 
           corrective action
4.    Evaluate
      • Is it Fit for purpose?
      • What additional action is 
        needed?

     ‘Benign neglect’ and Poorly‐made decisions in short‐term will have long‐term implications
Short-term decisions
   with long-term implications
     Software products          File formats & standards




Data organisation & labelling       Quality Controls
Why does data need to be managed?
Ensure data can be located        Enable analysis




                                  Interesting
                                paper. Where’s
                                   the data?




 Ability to understand for   Enable sharing & validation
 current and future need
Why does data need to be managed?
Ensure data can be located                               Enable analysis




                             Comply with Funder &
                             School requirements         Interesting
                                                       paper. Where’s
                                                          the data?




 Ability to understand for                          Enable sharing & validation
 current and future need
Researcher Challenges
                   Issues/challenges encountered when creating, managing,
                        and sharing research data (web survey results)




                                                              Other challenges
                                                              • Database creation & management
                                                              • Storage of physical questionnaires
Response Type
                                                              • Lack of time
 Multiple choice                                              • Software instability (particularly
checkbox + free                                                 NVivo)
  text for other                                              • Ability to enter & access data at
   challenges                                                   different locations
Training Needs
Interest in training on topics related to data management (web survey results)




                                                                             Note:
                                                         Graph omits percentages for other responses
                                                             (None, slight, moderate, no opinion)
RDM Support Service




      Location of Library staff
RDM Support Service

                            Role of Library staff
                              Provide first point of contact

                              Help researchers to express 
                            requirements & needs

                             Direct to potential solution (staff, 
                            website)

                              Contribute to training activities

                              Incorporate data considerations 
                            into teaching

      Location of Library staff
Data Access Over Time
       digital vs. analogue
 “traditionally, preserving things meant keeping them unchanged; 
 however … if we hold on to digital information without 
 modifications, accessing the information will become increasingly 
 more difficult, if not impossible.”
 Su‐Shing Chen, 2001



        +              +              +                 =




data        computer         OS           application       information
                                                              content
Change in Process over Time
                    Intel PC, 2000




                   Mac laptop, 2006




              X64 Ubuntu laptop, 2010




                     operating    software     information
        hardware
                      system     application     content
Change in Process over Time
                    Intel PC, 2000




                   Mac laptop, 2006




              X64 Ubuntu laptop, 2010




                     operating    software     information
        hardware
                      system     application     content
Task
• Select two of the following problems when managing digital data:
   1.   Difficulty locating data
   2.   Difficulty accessing media
   3.   Difficulty rendering data in an understandable form
   4.   Difficulty recreating data as originally intended
   5.   Difficulty understanding information content
   6.   Uncertain provenance

Consider the following questions:
a. In what circumstances will the chosen problem occur?
b. What consequences may occur if the problem occurs (e.g. financial 
   implications)
c. How could you ensure that the problem doesn’t occur?
d. What could you do to resolve the problem after it has
   occurred? (Can direct to someone for help)
1. Difficulty Locating Data
                  Problem
     “I created some data 5 years ago. Where is it?”
“I’ve lost my original disk. Do I have the data elsewhere?

          Scenarios & Reasons
              Loss of storage media
      Lots of data stored in many locations
    Vague filenames make it difficult to locate

           (Potential) Solutions
Preventative:
•    Copy data to several storage devices – increase likelihood
     of finding it
Post event:
•    Find better discovery software?
•    Attempt to recreate content?
2. Difficulty accessing Media
                 Problem
     “How do I access this old media?”
       “Why can’t I read this disk?”

          Scenario & Reasons

             Media obsolescence
        Physical deterioration & failure

                             (Potential) Solutions
Preventative:
•   Copy data to several storage devices
•   Transfer data to new storage media on obsolescence / every 3 years
•   Deposit data into a data archive and/or copy to server
Post event:
•   Data recovery software
Potential Storage Locations
                                                             Pros:
                                        Local machine &      Cheap, high capacity storage, fast access
                                            Storage
                                                             Cons:
                                                             Lack of support; potential for theft, loss, or 
                                                             damage


                                                             Pros:                              Recommended
                                       Academic Storage 
                                                             Automatic monitoring & backup, multiple 
                                           Systems           redundancy, remote access, secure (if required)
                                                             Cons:
                                                             Limited space allocation, Not always accessible 
                                                             overseas
                                      Third party service    Pros:
                                           providers         Automated backup, accessible in diff. countries 
                                                             (usually)
                                                             Cons:
                                                             Security concerns, ownership concerns, services 
                                                             can close account at any time 
http://www.flickr.com/photos/m0n0/4479450696/
3. Difficulty Rendering Data
                  Problem
               “How can I view data?
    “Where do I find software to access my data?”

           Scenarios & Reasons
             Software obsolescence
    New software use different decoding method


           (Potential) Solutions
Preventative:
•     Transform data to new formats (format conversion strategy)
•     Maintain original machine and software to access content (computer museum)
Post event:
•     Track down original software product
•     Emulate original environment (emulation/virtualisation)
Choosing File Formats

                Creation                  Preservation   Dissemination


Content Type           Preferred Format                  Acceptable Alternatives
 Documents                 Rich Text Format                  Microsoft DocX
                                                          Open Document Format
 Still Images                 TIFF                                 PNG,
                   JPEG 2000 (uncompressed)                        RAW

   Audio                     Wav format                            MP3
                               AIFF
                               FLAC
 AudioVideo                    MPEG2,
                               MPEG4


         When working with multiple copies, decide which is the master copy
4. Difficulty Maintaining
            Authenticity
                Problem
      “Why does my data look different?”

         Scenarios & Reasons
New version of software application use different 
               decoding method
     Different software application in use
         (Potential) Solutions
Preventative:
•   Determine significant properties that should be maintained
•   Maintain original machine and software to access content (computer museum)
Post event:
•   Emulate original environment (emulation/virtualisation)
5. Difficulty Understanding
             Content
                Problem
     “Where was this information created?
    Why did the creator make this decision?
        “What does this value mean?”
  “How does this data relate to other content?”

         Scenarios & Reasons
Memory fails – cannot remember decisions made
    Disorganised and poorly labelled data
            Lack of documentation
        (Potential) Solutions
• Organise data (Chronology, Experiment type, 
  location, content type)                         Does a Rosetta stone exist
• Adopt labelling conventions                          for your data? 
• Documentation
Filename conventions
• Consider the elements that will help you to organise and locate 
  content
     – E.g. Participant ID, site of data collection,date of data collection

•   Consider how data files and directories may be organised & sorted
     – 001, 002, 003, 004, can be used for sequential files
     – YYYY‐MM‐DD (2012‐12‐04) useful for organising by date (use year first)

•   Identify different versions of content in filename (and in content)
     – Creation date (YY‐MM‐DD)
     – Version/draft number

•   Consider how your filenames will look to others
     – Avoid spaces ‐ ‘My file.pdf’ becomes ‘My%20file.pdf’ on the web
     – Avoid capitalisation ‐ Alters file sorting & CAUSES HEADACHES!




                          Golden Rule: Be Consistent
Data Documentation
                 What would someone want to know if they
                  were looking at your data the first time?
1. What is the context of creation?
•    Why did you create it? For what purpose?
•    What methodology did you use? What assumptions were made?
•    Who is the target audience?
2. Collection and set of files:
•    What information does each file contain?
•    When was it created?
•    By whom?
•    What actions were performed?
•    How does the data contained in the collection relate to each other?
3.    Individual components
•    What is the meaning of this word/column/row, etc.?
•    How are these items measured?
•    What are the boundaries of the measurement?
6. Uncertain Provenance
                    Problem
1. “When was the data created and/or modified?”
2. “Who created/modified the data?”
3. “Why was it created and/or modified?

             Scenarios & Reasons
•     Lack/Loss of trust in information content
•     Reluctance to use information content
            (Potential) Solutions
    Preventative:
    • Limit update to authorised users only
    • Store change history
    • Keep each version
    Post event:
    • Locate data creator & editor?
Things to Recommend
Advise researchers to:

1. Choose an appropriate storage location and create backups

2. Organise data in a consistent and logical manner

3. Document the data and information content (as well as structure)

4. Consider how you will ensure that information can be accessed in 
   the long‐term

5. Consider potential for data sharing and ensure it is performed with 
   consideration of ethics  
A Few Good References
• Digital Curation Centre
  http://www.dcc.ac.uk/resources
• MANTRA – Data Management training for PhD students
  http://datalib.edina.ac.uk/mantra/
• UK Data Archive – Managing and Sharing Data
  http://www.data‐archive.ac.uk/media/2894/managingsharing.pdf
• Cambridge University – RDM Guidance
  http://www.lib.cam.ac.uk/dataman/index.html
• Australia National Data Service
  http://ands.org.au/resource/data‐management‐planning.html
• LSHTM Research Data Management Support Service
• http://blogs.lshtm.ac.uk/rdmss/

More Related Content

Viewers also liked

PrePARe: What is 'data'?
PrePARe: What is 'data'?PrePARe: What is 'data'?
PrePARe: What is 'data'?dspace_cam
 
Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for librariesLEARN Project
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementCloudbells.com
 
Data-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity ModelData-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity ModelDATAVERSITY
 
Data management plan (important components and best practices) final v 1.0
Data management plan (important components and best practices) final v 1.0Data management plan (important components and best practices) final v 1.0
Data management plan (important components and best practices) final v 1.0Amiit Keshav Naik
 
Gartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data ManagementGartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data ManagementGartner
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementSung Kuan
 

Viewers also liked (11)

Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
 
PrePARe: What is 'data'?
PrePARe: What is 'data'?PrePARe: What is 'data'?
PrePARe: What is 'data'?
 
Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for libraries
 
What is "data"?
What is "data"?What is "data"?
What is "data"?
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Data-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity ModelData-Ed Online: Data Management Maturity Model
Data-Ed Online: Data Management Maturity Model
 
Data management plan (important components and best practices) final v 1.0
Data management plan (important components and best practices) final v 1.0Data management plan (important components and best practices) final v 1.0
Data management plan (important components and best practices) final v 1.0
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Gartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data ManagementGartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data Management
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 

Similar to Digital Data Management Challenges and Solutions

Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycleSherry Lake
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computersNoonapau
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Key Considerations for a Successful Hyperion Planning Implementation
Key Considerations for a Successful Hyperion Planning ImplementationKey Considerations for a Successful Hyperion Planning Implementation
Key Considerations for a Successful Hyperion Planning ImplementationAlithya
 
Data analysis – using computers for presentation
Data analysis – using computers for presentationData analysis – using computers for presentation
Data analysis – using computers for presentationNoonapau
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingUniversity of Arizona
 
Findit v2 1
Findit v2 1Findit v2 1
Findit v2 1ires1409
 
Guiding researchers to the web tools they need: The rationale behind a Web to...
Guiding researchers to the web tools they need: The rationale behind a Web to...Guiding researchers to the web tools they need: The rationale behind a Web to...
Guiding researchers to the web tools they need: The rationale behind a Web to...ALISS
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsMarieke Guy
 
Enhancing AT through ID techniques handouts
Enhancing AT through ID techniques handoutsEnhancing AT through ID techniques handouts
Enhancing AT through ID techniques handoutsnorthavorange
 
Smartphone-Educational Apps
Smartphone-Educational AppsSmartphone-Educational Apps
Smartphone-Educational Appssinpaak
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 

Similar to Digital Data Management Challenges and Solutions (20)

Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computers
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Key Considerations for a Successful Hyperion Planning Implementation
Key Considerations for a Successful Hyperion Planning ImplementationKey Considerations for a Successful Hyperion Planning Implementation
Key Considerations for a Successful Hyperion Planning Implementation
 
Data analysis – using computers for presentation
Data analysis – using computers for presentationData analysis – using computers for presentation
Data analysis – using computers for presentation
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data Sharing
 
Findit v2 1
Findit v2 1Findit v2 1
Findit v2 1
 
Eportfolio Feasability Project
Eportfolio Feasability ProjectEportfolio Feasability Project
Eportfolio Feasability Project
 
Guiding researchers to the web tools they need: The rationale behind a Web to...
Guiding researchers to the web tools they need: The rationale behind a Web to...Guiding researchers to the web tools they need: The rationale behind a Web to...
Guiding researchers to the web tools they need: The rationale behind a Web to...
 
Choosing the Right UX Method
Choosing the Right UX MethodChoosing the Right UX Method
Choosing the Right UX Method
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 
Enhancing AT through ID techniques handouts
Enhancing AT through ID techniques handoutsEnhancing AT through ID techniques handouts
Enhancing AT through ID techniques handouts
 
Smartphone-Educational Apps
Smartphone-Educational AppsSmartphone-Educational Apps
Smartphone-Educational Apps
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 

More from GarethKnight

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in ResearchGarethKnight
 
Making Sense of a Digital Collection
Making Sense of a Digital CollectionMaking Sense of a Digital Collection
Making Sense of a Digital CollectionGarethKnight
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankGarethKnight
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospectiveGarethKnight
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyGarethKnight
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceGarethKnight
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...GarethKnight
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyGarethKnight
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...GarethKnight
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationGarethKnight
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the ArchiveGarethKnight
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and CurateGarethKnight
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...GarethKnight
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...GarethKnight
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital researchGarethKnight
 

More from GarethKnight (16)

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in Research
 
Making Sense of a Digital Collection
Making Sense of a Digital CollectionMaking Sense of a Digital Collection
Making Sense of a Digital Collection
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bank
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospective
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case study
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategy
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the Archive
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and Curate
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital research
 

Digital Data Management Challenges and Solutions

  • 1. Data Management for Librarians: An Introduction February 19th 2013 Gareth Knight Manager RDM Support Service
  • 2. What is Data? “Data are facts, observations or experiences on which an argument, theory or  test is based. Data may be numerical, descriptive or visual. Data may be raw or  analysed, experimental or observational.“ http://research.unimelb.edu.au/integrity/conduct/data/review May originate from various sources:  Primary and/or secondary May contain different content: Quantitative and/or qualitative May be expressed in different forms: Datasets, still images, audio‐video, audio recordings, interactive resources   May be held in a number of variations: Raw, cleaned, anonymised/pseudomised, analysed May be encoded in different formats: MS Excel, TIFF, MPEG2, STATA, FoxPro What type of data do you have at home?
  • 3. Data in the Research Lifecycle Brainstorm Finalise &  Develop  submit Proposal Write‐up  Plan Project Results Perform  Research
  • 4. Data in the Research Lifecycle Brainstorm Finalise &  Develop  Produce Data  Develop  submit Proposal Management  Proposal Plan Write‐up  Plan Project Results Perform  Research
  • 5. Data in the Research Lifecycle Brainstorm Finalise &  Develop  submit Proposal Write‐up  Plan  Results Project Perform  Perform  Research Research Create /  Share Reuse Describe Analyse Store
  • 6. Data in the Research Lifecycle Share Brainstorm Finalise &  Finalise &  Develop  submit submit Proposal Archive Write‐up  Plan  Results Project Perform  Perform  Research Research Create /  Share Reuse Describe Analyse Store
  • 7. What is Data Management? 1. Plan • Determine requirements • Identify risks & opportunities • Decide approach 2. Implement 3. Monitor • Evaluate approach • Change approach/perform  corrective action 4. Evaluate • Is it Fit for purpose? • What additional action is  needed? ‘Benign neglect’ and Poorly‐made decisions in short‐term will have long‐term implications
  • 8. Short-term decisions with long-term implications Software products File formats & standards Data organisation & labelling Quality Controls
  • 9. Why does data need to be managed? Ensure data can be located Enable analysis Interesting paper. Where’s the data? Ability to understand for Enable sharing & validation current and future need
  • 10. Why does data need to be managed? Ensure data can be located Enable analysis Comply with Funder & School requirements Interesting paper. Where’s the data? Ability to understand for Enable sharing & validation current and future need
  • 11. Researcher Challenges Issues/challenges encountered when creating, managing, and sharing research data (web survey results) Other challenges • Database creation & management • Storage of physical questionnaires Response Type • Lack of time Multiple choice • Software instability (particularly checkbox + free NVivo) text for other • Ability to enter & access data at challenges different locations
  • 12. Training Needs Interest in training on topics related to data management (web survey results) Note: Graph omits percentages for other responses (None, slight, moderate, no opinion)
  • 13. RDM Support Service Location of Library staff
  • 14. RDM Support Service Role of Library staff Provide first point of contact Help researchers to express  requirements & needs Direct to potential solution (staff,  website) Contribute to training activities Incorporate data considerations  into teaching Location of Library staff
  • 15. Data Access Over Time digital vs. analogue “traditionally, preserving things meant keeping them unchanged;  however … if we hold on to digital information without  modifications, accessing the information will become increasingly  more difficult, if not impossible.” Su‐Shing Chen, 2001 + + + = data computer OS application information content
  • 16. Change in Process over Time Intel PC, 2000 Mac laptop, 2006 X64 Ubuntu laptop, 2010 operating software information hardware system application content
  • 17. Change in Process over Time Intel PC, 2000 Mac laptop, 2006 X64 Ubuntu laptop, 2010 operating software information hardware system application content
  • 18. Task • Select two of the following problems when managing digital data: 1. Difficulty locating data 2. Difficulty accessing media 3. Difficulty rendering data in an understandable form 4. Difficulty recreating data as originally intended 5. Difficulty understanding information content 6. Uncertain provenance Consider the following questions: a. In what circumstances will the chosen problem occur? b. What consequences may occur if the problem occurs (e.g. financial  implications) c. How could you ensure that the problem doesn’t occur? d. What could you do to resolve the problem after it has occurred? (Can direct to someone for help)
  • 19. 1. Difficulty Locating Data Problem “I created some data 5 years ago. Where is it?” “I’ve lost my original disk. Do I have the data elsewhere? Scenarios & Reasons Loss of storage media Lots of data stored in many locations Vague filenames make it difficult to locate (Potential) Solutions Preventative: • Copy data to several storage devices – increase likelihood of finding it Post event: • Find better discovery software? • Attempt to recreate content?
  • 20. 2. Difficulty accessing Media Problem “How do I access this old media?” “Why can’t I read this disk?” Scenario & Reasons Media obsolescence Physical deterioration & failure (Potential) Solutions Preventative: • Copy data to several storage devices • Transfer data to new storage media on obsolescence / every 3 years • Deposit data into a data archive and/or copy to server Post event: • Data recovery software
  • 21. Potential Storage Locations Pros: Local machine &  Cheap, high capacity storage, fast access Storage Cons: Lack of support; potential for theft, loss, or  damage Pros: Recommended Academic Storage  Automatic monitoring & backup, multiple  Systems redundancy, remote access, secure (if required) Cons: Limited space allocation, Not always accessible  overseas Third party service  Pros: providers Automated backup, accessible in diff. countries  (usually) Cons: Security concerns, ownership concerns, services  can close account at any time  http://www.flickr.com/photos/m0n0/4479450696/
  • 22. 3. Difficulty Rendering Data Problem “How can I view data? “Where do I find software to access my data?” Scenarios & Reasons Software obsolescence New software use different decoding method (Potential) Solutions Preventative: • Transform data to new formats (format conversion strategy) • Maintain original machine and software to access content (computer museum) Post event: • Track down original software product • Emulate original environment (emulation/virtualisation)
  • 23. Choosing File Formats Creation Preservation Dissemination Content Type Preferred Format Acceptable Alternatives Documents Rich Text Format Microsoft DocX Open Document Format Still Images TIFF PNG, JPEG 2000 (uncompressed) RAW Audio Wav format MP3 AIFF FLAC AudioVideo MPEG2, MPEG4 When working with multiple copies, decide which is the master copy
  • 24. 4. Difficulty Maintaining Authenticity Problem “Why does my data look different?” Scenarios & Reasons New version of software application use different  decoding method Different software application in use (Potential) Solutions Preventative: • Determine significant properties that should be maintained • Maintain original machine and software to access content (computer museum) Post event: • Emulate original environment (emulation/virtualisation)
  • 25. 5. Difficulty Understanding Content Problem “Where was this information created? Why did the creator make this decision? “What does this value mean?” “How does this data relate to other content?” Scenarios & Reasons Memory fails – cannot remember decisions made Disorganised and poorly labelled data Lack of documentation (Potential) Solutions • Organise data (Chronology, Experiment type,  location, content type) Does a Rosetta stone exist • Adopt labelling conventions for your data?  • Documentation
  • 26. Filename conventions • Consider the elements that will help you to organise and locate  content – E.g. Participant ID, site of data collection,date of data collection • Consider how data files and directories may be organised & sorted – 001, 002, 003, 004, can be used for sequential files – YYYY‐MM‐DD (2012‐12‐04) useful for organising by date (use year first) • Identify different versions of content in filename (and in content) – Creation date (YY‐MM‐DD) – Version/draft number • Consider how your filenames will look to others – Avoid spaces ‐ ‘My file.pdf’ becomes ‘My%20file.pdf’ on the web – Avoid capitalisation ‐ Alters file sorting & CAUSES HEADACHES! Golden Rule: Be Consistent
  • 27. Data Documentation What would someone want to know if they were looking at your data the first time? 1. What is the context of creation? • Why did you create it? For what purpose? • What methodology did you use? What assumptions were made? • Who is the target audience? 2. Collection and set of files: • What information does each file contain? • When was it created? • By whom? • What actions were performed? • How does the data contained in the collection relate to each other? 3. Individual components • What is the meaning of this word/column/row, etc.? • How are these items measured? • What are the boundaries of the measurement?
  • 28. 6. Uncertain Provenance Problem 1. “When was the data created and/or modified?” 2. “Who created/modified the data?” 3. “Why was it created and/or modified? Scenarios & Reasons • Lack/Loss of trust in information content • Reluctance to use information content (Potential) Solutions Preventative: • Limit update to authorised users only • Store change history • Keep each version Post event: • Locate data creator & editor?
  • 29. Things to Recommend Advise researchers to: 1. Choose an appropriate storage location and create backups 2. Organise data in a consistent and logical manner 3. Document the data and information content (as well as structure) 4. Consider how you will ensure that information can be accessed in  the long‐term 5. Consider potential for data sharing and ensure it is performed with  consideration of ethics  
  • 30. A Few Good References • Digital Curation Centre http://www.dcc.ac.uk/resources • MANTRA – Data Management training for PhD students http://datalib.edina.ac.uk/mantra/ • UK Data Archive – Managing and Sharing Data http://www.data‐archive.ac.uk/media/2894/managingsharing.pdf • Cambridge University – RDM Guidance http://www.lib.cam.ac.uk/dataman/index.html • Australia National Data Service http://ands.org.au/resource/data‐management‐planning.html • LSHTM Research Data Management Support Service • http://blogs.lshtm.ac.uk/rdmss/

Editor's Notes

  1. Data may refer to physical and digital artefacts
  2. http://www.flickr.com/photos/marc_smith/5943394090/
  3. http://www.flickr.com/photos/johnhurn/2419971258/ http://www.flickr.com/photos/nonny/199568095/ http://www.flickr.com/photos/calotype46/6683293291/ http://www.flickr.com/photos/eq/4990131757/
  4. http://www.flickr.com/photos/johnhurn/2419971258/ http://www.flickr.com/photos/nonny/199568095/ http://www.flickr.com/photos/calotype46/6683293291/ http://www.flickr.com/photos/eq/4990131757/
  5. 117 respondents
  6. Provide first point of contact for library visitors
  7. Provide first point of contact for library visitors
  8. Strategy is format conversion, computer museum, emulation
  9. Strategy is format conversion, computer museum, emulation
  10. A good file name is contextual, tailored to research needs