DaisyParry
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # The Turing Way - Project Design Chapter based on Data Study Group ## Life cycle of a project overview ![](https://zenodo.org/api/iiif/v2/e4125eaf-b456-4097-85fc-6a2e80482d1c:f6c04b0b-0f25-4062-9287-6c1ea94317bf:1728_TURI_Book%20sprint_36%20data%20research%20cycle_040619.jpg/full/750,/0/default.jpg) ## Overview This case study gives an overview of the various elements that project manager must consider when embarking on a new data science project, from the proposal to research phase, including: * Scoping & defining * Data preparation * Recruiting talent * Data access and security * Ethics assessment * Impact planning ### Who can help? - Daisy Parry - Jules Manser - Will be opened for review by anyone who is interested ## Elements of project design # Scoping & Definition ### Initial Considerations The below items must also be considered at the stage of conception so that the feasibility of the project can be assessed and designed effectively – A great research questions sadly doesn’t always equate to a great project. #### What/ How * What are the research objectives of the project? Do the research objectives make sense? For example have many similar projects taken place before, in which case is this piece of research necessary by examining a new or novel aspect? * What data will be used for this project? How will this data be collected, how will permission be obtained to use it, how will it be cleaned/ merged and formatted in preparation for the research? Will the data contain sensitive information? If so, how will the data be stored and accessed to ensure it remains secure? * What methods of analysis is the project proposing, what subsequent research resources will be required to support this? * How will the results be used when the project has concluded to ensure that all the hard work doesn’t end up in the draw? #### Who * Who will be involved in preparing, conducting & presenting findings. Are they in place already, if not how will they be recruited? * Is your institution/ researchers the people for the job, are there existing groups that should be invited to collaborate or consulted before work begins? #### Why * What will the impact be to the immediate stakeholders of the project? What is the impact to the research community as a result of this work being undertaken. What are the possible benefits and negatives to wider society of the work being undertaken? What are the ethical implications of the project and how can these be mitigated against. ### Idea to Proposal When an idea is first conceived, it is unlikely to arrive as a perfectly formed research question, it will likely enter a scoping phase which will tease out a sensible and feasible version of the question and sub questions. For example, if the key question is fairly general or wide reaching, it will need to be broken down into sub questions that can then be tackled empirically. The data proposed might then not be suitable to answer the adjusted question and so re-framing might be necessary so that the data remains expected to answer the question. ### The Initial Proposal The project proposal (or a similar document) is one of the first items that will be produced. This will likely be used to pitch the project to collaborators and funders, and to generally spread the word of its conception to stakeholders and the research community. An initial review of the data should take place before the proposal is written, to identify obvious issues, if there are major issues these should be addressed, or at least mitigation plans defined, before the proposal is finalised. The initial proposal does not need to be perfect and should be considered a document for iteration. The proposal should map out key elements of the project as much as is possible at the time of writing. The exercise of writing the proposal will likely reveal gaps and that is ok! The project proposal will likely be written by the scientific researchers involved in the project, with peer review encouraged. It's not expected that the full team will be in place by this stage and the proposal may well be used to attract talent to the project. Equally the project partner could have authorship and in the context of Data Study Groups, it is the organisation proposing the project that writes the project proposal. In such cases we advise that the proposal is subject to academic review. Concretely a data science project proposal should include: * A description the project and the challenge it seeks to solve. * Back ground information : Why is solving this challenge beneficial. What are the main difficulties, or approaches that have been considered/tried before (if applicable). How is the project necessary to the solution of the problem as well as the scientific basis for this. * Data information : What data will be used for the project. It might be useful to include information on each dataset including data inventory, size, variable descriptions, description of data collection mechanism, level of data sensitivity/confidentiality, etc. How will this data need to be handled and are there any known issues of working with data of this kind, how will these be mitigated against. How will permission be obtained to use this data? * Project impact: What is the expected impact of the project? Considering its direct stakeholders, wider society and research community. What would channels of follow up work look like after the project and what is the intended use of any findings. # Data Preparation Considerations of the data suitability, readiness and collection should begin at the time of project proposal development. If the data is unsuitable, incomplete or won't be ready in time then the whole project could be compromised. Data preparation should begin as soon as the project question is finalised. Initial data readiness will be different for every project. In cases the data may not exist and a method of collection will be the first action. In other cases, pre-curated data sets may be used with the key action being obtaining use rights. Data is research ready when it has been collected, constructed, cleaned, checked for gaps, when potential sensitivities have been considered (and mitigated against) and you have the right to use and work on the datasets. ## Key Considerations * Data Readiness : If the data doesn't yet exist, method of collection must be devised. If the raw data exists but it has yet to be curated for purpose, methodology for this must be established. If the data is nearly ready, then it must be checked for gaps and cleanliness. * Data Appropriateness : Data should be highly relevant to the research question. Even if data looks to be suitable, there may be additional sets or resources that already exist that can enrich and improve the scope of research. * Data Quantity : Data must be large enough to effectively run analysis and experimentation, if not enough data point exists, a method of generating more should be pursued. This could be by obtaining complimentary data sets or generating synthetic or collecting new data. Equally, researchers may be faced with an abundance of data, so much that meaningful analysis becomes hard. In such cases researchers may need to edit and refine what data will be used. * Data Sensitivity : How sensitive is the data that will be used? The more sensitive the data set it, the more restrictions will be needed to protect it during the research phase which inhibits researchers ease of access and experiment. If the data contains personal information or sensitive commercial information then it is likely to be highly sensitive. In such cases it may be possible to reduce the sensitivity of the data by either removing or anonymising areas of interest. Even if the data is not especially sensitive, the project as a whole may produce sensitive results, so it may be worth taking the same measures to reduce sensitivity as much as possible, reducing the security measures that will be necessary and minimising negative implications of a data breach. * Data Completeness/ Reliability : The data should be checked for missing observations and unreliable data points. Any incompleteness or unreliability must be assessed as to its impact on the project and what can be done to minimise missing values and maximise overall reliability. * Data Permissions / Legal Considerations : It is essential that you have the right to use the data. If data has been generated in house and involves no human data, than this may not be necessary. However, in many cases the data will come from a collaborator or from a third party data provider. In these cases, at minimum a data sharing agreement should be enacted so that both parties are protected. Depending on organisations, this may be a straight forward process, or may take many iterations and discussion with legal teams. Data sharing agreements should therefore be discussed from the get go and be one of the first actions when considering data preparation. If the data contains personal information, such as patient data, then you must be able to prove that the subjects have consented for their data to be collected and used. # Recruiting Talent The team needed to work on each project will vary from project to project. In most cases, a project will require a Principal Investigator (or similar role). They will likely be a senior academic with expertise in the project area and experience leading on similar investigations. They will act a champion of the project and the scientific sounding board. A PI may be in place naturally from the start of the project, or they may need to be recruited as the project comes to fruition. For larger projects a full time research team may need to be recruited, for smaller projects support from PhD candidates may be more appropriate. # Data Access & Security It is essential that third party providers have confidence that their data will be handled appropriately with the necessary security measures, and equally that researchers are protected while working on the project. The data sensitivity, or conditions of use for the data will dictate how it can be shared with researchers and how they can work on it. When assessing the data’s sensitivity, it is important to consider the project as a whole and wider uses of the data. For example data of publicly available satellite images might not seem sensitive, but if the project then seeks to extract a list of properties with specific features, this list could then become quite sensitive, especially if the occupants have not consented to their addresses being on such a list. Another example is twitter post, this is public and can be scraped by anyone, but if you were to then compile a list of users with certain political beliefs, this then becomes extremely sensitive information. ## Data Transfer If a project is considered sensitive, security measures will be required when the data is transferred from a third party to the research manager. Methods such as Azure storage explorer could be used, in which a secure one way upload link is sent to the device uploading the data. This link is only usable by that devices specific public facing IP address. Alternatively, the data could be transferred by hand. It is likely each institution will have their preferred method of transferring data of this nature and so a discussion may be required to meet the system and security needs of each party. ## Data Storage & Access When deciding how to store the data, managers should consider who will need access and what they will need to do with the data in terms of research. This may impact the storage method used. Non sensitive data may not require full blown security measures like sensitive data. However, it should still be stored securely, so that only those who need access have it. This is good practise for handling data. Sensitive data, once transferred should be placed in some kind of secure location including security measures. This could be research environment, such as a 'Safe Haven ‘or equivalent trusted secure platform. Depending on the storage method, this may need to be deployed/ set up beforehand so be sure to establish a need for this (or not) early on. Following the Turing’s 'Safe Haven' Model of secure research environments, there are various security measures that can be implemented depending on the sensitivity of the contents. These can include restrictions on internet access, packages, and copy and paste functions from within the environment. The more restrictions in place from within the environment, the slower it will be for researchers to work on the project. It is therefore essential that only necessary measures are in place. Researchers and the data provider should agree beforehand on how sensitive the project is, as well as what security measures should be present. # Ethics Assessment In most academic institutions it is necessary for incoming projects to go through some kind of ethics application and approval process. Even if this is not a requirement, the ethics assessment principals should be applied to the project. The ethics application and approval process should be integrated into the thinking about the project, rather than administrative box ticking. The ethics process is to be completed by the academics working on the project with scientific insight rather that research project managers. Academics must collate the required information . If more information is required for a full ethics assessment, it is the academics responsibility to obtain this from collaborators. The core ethical questions to be considered are below, the function of each is to identify risks and inform how the research plans to minimise or eliminate those risks regarding: : * What the ethical implications of the project? * How was the data collected? * How was consent obtained for collecting the data? * Will any issues of privacy & security arise from the project or resulting outputs? * How does the project plan on keeping the data private & secure? # Impact planning It is important to consider what will happen post project so that the work doesn't end up in the drawer. This should be done as the project is being designed as it might influence tools used. Start with the end in mind so that when the project finishes, plan are in place to do it justice ensuring findings will be put to meaningful use. Channels for ensuring longevity might include: * Planning follow on projects, could the work undertaken lead on to a bigger follow on project? If so, who might this be relevant to, are there relevant research groups or academics that could be approached? * Publishing, can the work be published after the project is complete? If the project contains sensitive information and data, can a version be published that omits the sensitive information? This will maximise impact to the scientific research community. But be careful that redaction does not alter the overall narrative. Again, a consideration at project proposal stage. * Exposure, how will the work be published to reach the widest audience, are there relevant newsletters or bulletins that it could be included in? Or social media groups that the work could be shared via? * Formatting, if the work or code can be shared, could the format be designed so that it is a compatible as possible with a wide audience. For example by using widely used and accessible tools rather than specific paid for versions.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully