Exploitation / Usage

Statistical Analysis


  • R statistics software
  • Python statistics tools (pandas, seaborn, ipython, scipy)
  • Matlab


  • Which statistical test is possible with the collected data? (This should have been planned BEFORE experiment!)
  • Ensure that proper analysis techniques are used depending on the kind of data distribution (e.g. mean vs. bimodal distributions)
  • If hypotheses could not be supported by data, can different hypotheses be derived?
  • Make sure to identify impact of your results on theoretic background!

Publishing Strategies

  • Should already have been initially determined BEFORE experiment - see Planning
  • consider potential additional publication(s) - e.g. the data set itself
  • Be sure to comply with the consent forms

Dissemination of the data set

  • Collect logins for data access each time you provide access to a person to have usage statistics and the potential to contact users back.
    • Have users agree to this logging before providing access.
  • Document the process how to hand over the data set to a potential user.

Long-term strategies

  • Where and how to preserve consent forms?
  • Long-term storage of the corpus data
    • Backups
  • Complete documentation
    • Completely document the final format of the data set as well as the tools used to process it this way and the ones required to read it. This ensures that users can actually use the data set even if the maintainer has gone. Also, if the data set is intended for the public, this prevents writing the same explanations to individuals over and over again.


  • Identify next steps
    • given the results, what aspects of the system should be changed? is a follow-up experiment necessary or possible?