A Self-guided resource and short video for a general overview of data management practices
Data management involves developing strategies and processes to ensure that research data are well organized, formatted, described, and documented, during a project’s lifecycle to support the potential sharing and archiving of resultant data. General good practices and resources are highlighted in this guide and short video.

Learn about services available for publishing data and making it openly accessible at Duke.
This workshop provides an overview of Duke's Research Data Repository. The general functionalities of the platform and tips for submitting data will be discussed as well as how repositories can help researchers comply with funder and journal policies as well as meet growing standards around data stewardship and sharing, such as the FAIR Guiding Principles.

Part of a series of data management 101 workshops developed for scientists, social scientists, and humanists respectively.
Humanists work with various media, content and materials (sources) as part of their research. These sources can be considered data. This workshop introduces data management practices for humanities researchers to consider and apply throughout the research lifecycle. Good data management practices pertaining to planning, organization, documentation, storage and backup, sharing, citation, and preservation will be presented through a humanities lens with discipline-based, concrete examples.

Learn about OSF features and tools that support research project management and reproducibility.
The Open Science Framework (OSF) is a free, open source project management tool developed and maintained by the Center for Open Science. The OSF can help scholars manage their workflow, organize their materials, and share all or part of a project with the broader research community. This workshop will demonstrate some of the key functionalities of the tool including how to structure your materials, manage permissions, version content, integrate with third-party tools (such as Box, GitHub, or Mendeley), share materials, register projects, and track usage. This workshop was presented in the Spring of 2018.

A workshop taught in collaboration with the ITSO office designed to teach faculty and staff best practices for dealing with sensitive data.
In the course of your research you may collect, interact with or analyze data that are classified as “Sensitive” or ""Restricted"" according to Duke's data classification standard. In this workshop we will examine common sensitive data types, how Duke’s IRB and Information Technology Security Office (ITSO) expects you to protect that data throughout your project’s lifecycle and the resources available to you for sensitive data storage and analysis, data de-identification, and data archiving and sharing.

A workshop developed to help early career researchers curate and publish their data.
Data management practices help researchers take care of their data throughout the entire research process from the planning phase to the end of a project when data might be shared or “published” within a repository. This workshop provides hands-on experience where participants will learn strategies for how to prepare data for publishing by “curating” an example dataset and identifying common data issues using the Data Curation Network “CURATE” model. Participants will also learn about the overall role of repositories within the data sharing landscape and learn strategies for locating and assessing repositories.

Guides to data analysis using Excel

A data-first language boasting packages for ingest, analysis, reports & visualization

A guide to resources links for for numerical computing

Humanities-focused guides for turning single spreadsheets into normalized tables using Excel
The purpose of this workshop is to demonstrate simple steps in Excel that you can take to transform a single spreadsheet (such as a master copy of your data that you used to facilitate the gathering process) into a series of normalized tables that can be used to populate a relational database model using, for example, MySQL. In order to accomplish this, we first identify entities and corresponding attributes of those entities within the master datasheet, and then create separate tables for each entity that can be connected to one another by foreign keys (columns that reference, by means of an identification code, columns present in other tables). This workshop does not require any coding experience, but it is recommended that users are familiar with Excel basics.

Programming power for non-coders
Open Refine allows for easy exploration of data. Define facets within data, identify data inconsistencies, quickly clean and transform data. Open Refine is an often intuitive but powerful tool for normalizing data. Use this before importing the dataset into a presentation application (e.g. mapping, charting, or analyzing.)

Short video motivating why humanities scholars could benefit from learning the Python programming language

Introduction to using the Pandas module in Python for reading and accessing your tablular (spreadsheet) data

Slightly more advanced Pandas with Python for making your tabular (spreadsheet) data tidy and joining together multiple data tables

Use regex to find patterns in text
Regular Expressions are a powerful method of finding patterns in text. For example: find all words ending in ""ing""; all words which begin with a capital letter; all telephone area codes that begin with either the numbers 7 or 8; all email addresses which contain ""duke.edu"". Many programming languages use regular expressions as a means to support pattern matching.
Many research projects involve textual data, and computational advances now provide the means to engage in various types of automated text analysis that can enhance these projects. Understanding what analysis techniques are available and where they can appropriately be applied is an important first step to beginning a text analysis project. This hands-on approach to text analysis will give a quick overview of small- and large-scale text-based projects before addressing strategies for organizing and conducting text analysis projects. Tools for data collection, parsing and eventual analysis will be introduced and demonstrated. The workshop will focus on acquiring and preparing text sources for small-scale projects and text-based visualizations, but many of the techniques will be useful for larger projects as well. For this introduction, the focus will primarily be on using Graphical User Interface (GUI) tools like Microsoft Excel and Google Refine, instead of programming languages and command line approaches.

Gathering data from websites, HTML & JSON Parsing, APIs and gathering Twitter streams
Preexisting clean data sets such as the General Social Survey (GSS) or Census data, for example, are readily available, cover long periods of time, and have well documented codebooks. However, some people want to gather their own data. Recent tools and techniques for finding and compiling data from webpages, whole websites or social media sources have become more accessible. But these techniques provide a different layer of complexity.

General tips for creating better data visualization and visual communication

Things to consider when trying to create or improve your posters

Using Tableau to create easy interactive charts and maps for data exploration and communication

Basic Adobe Illustrator skills and tools for creating diagrams

Some basic skills, but additional Adobe Illustrator issues that come up when cleaning up charts

Using the Python module Altair for data visualization and exploration that can be displayed on the web

Guide to Esri's platform for creating interactive maps and web applications
ArcGIS Online (AGOL) is a companion to the ArcGIS client that allows members of a group to store and share spatial data online and that can be used independently or in conjunction with the client. We'll discuss aspects of the AGOL organizational account, adding and accessing content, creating map and feature services, creating and sharing web maps and presentations, publishing web applications, and using analysis tools.

Mapping and spatial analysis with the latest desktop GIS application from Esri

Legacy desktop GIS software from Esri

Importing GIS data into Google Earth

Free and open-source desktop GIS software
Looking for an open source option for GIS? QGIS is free and it is one alternative to using ArcGIS. In this workshop we will demonstrate how to import and analyze data in QGIS and discuss the benefits of using QGIS over other GIS software.

Using the R langauge to procese, analyze, and visualize geospatial data
R has become a popular and reproducible option for supporting spatial and statistical analysis. This hands-on workshop will demonstrate how to plot x/y coordinates; how to generate thematic choropleths with US Census and other federal data; import, view and produce shapefiles; and create leaflet maps for viewing on the web.

Tableau's capabilities to visualize spatial data

Overview of tools for developing interactive online maps

Current and historical financial data, financial markets news, and economic data

Geospatial datasets for local, US, and inernational areas

Sources for data on international trade between countries, at aggregate and commodity levels

The premier source for detailed commodity-level trade data between countries

Sources to get data from the US population and economics Censuses, both current and historical

Economic data sources from US federal government agencies.