Data Organization, Cleaning, Analysis and Visualization



July 8-10,13,15, & 17th, 2020

2:00 pm - 4:00pm only on July 8th, 9:00 am - 11:00 am for the rest

Instructors: Eve Bohnett, Lorraine Ling, Rachel Lombardi, Sichong Peng

Helpers: Altaf Kassam

General Information

Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Its target audience is researchers who have little to no prior computational experience, and its lessons are domain specific, building on learners' existing knowledge to enable them to quickly apply skills learned to their own research. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Good Enough Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: This training will take place online. The instructors will provide you with the infromation you will need to connect to this meeting.

When: July 8-10,13,15, & 17th, 2020 . Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are dedicated to providing a positive and accessible learning environment for all. Please notify the instructors in advance of the workshop if you require any accommodations or if there is anything we can do to make this workshop more accessible to you.

Contact: Please email for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.

Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Week 1

Before starting Pre-workshop survey & Setup
July 8 Intro to Carpentries
Spreadsheets/Open Refine
July 9 Intro to R
July 10 Starting with Data

Week 2

July 13 Manipulating Data
July 15 Data Visualisation
July 17 TBD, audience's choice
More Dataframe Manipulation with dplyr
More plotting with ggplot2
Functions Explained
End Post-workshop survey



To participate in a Data Carpentry workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

The setup instructions for the Data Carpentry Ecology workshops (with R) can be found at the workshop overview site.


During the workshop, we will use Slack as a place for learners to ask questions, get help, and participate in polls. You will receive an invitation to join the Slack workplace via Instructor, Eve Bohnett. If you have not yet received this email, please contact workshop host: Altaf Kassam

Spreadsheet Program

To interact with spreadsheets, we can use LibreOffice, Microsoft Excel, Gnumeric,, or other programs. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same. For this workshop, we recommend using either Microsoft Excel (paid software) or LibreOffice (free and open source). Other spreadsheet programs may not have all of the features we will be exploring in this workshop.

To install LibreOffice, go to their download page . The website should automatically select the correct option for your operating system. Click the "Download" button. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically. Once the installer is downloaded, double click on it (you may need to open your Downloads folder) and LibreOffice should install.

Open Refine

OpenRefine is a Java program that runs on your local machine (not on the cloud). Although it displays in your browser, no web connection is needed and your data remains local. You need to have a ‘Java Runtime Environment’ (JRE) installed on your computer to run OpenRefine. If you don’t already have one installed then you can download and install from by going to the site and clicking “Free Java Download”.

To install OpenRefine, go to their download page . From the download page, select either "Windows kit", "Mac kit", or "Linux kit" - depending on your operating system - and follow the instructions next to your download link. This lesson has been tested with all versions of OpenRefine up to the latest tested version, 3.2. **If you are using an older version, it is recommended you upgrade to the latest tested version.** After installing, you can delete the installer `.dmg` file.

You may get an error message: " can't be opened because it is from an unidentified developer." If you get this message, open your system preferences and click "Security & Privacy". You will see a message " was blocked from opening because it is from an unidentified developer." Click "Open Anyway" and "Yes". OpenRefine should open in your default web browser.

OpenRefine does not support Internet Explorer or Edge. Please use Firefox, Chrome or Safari instead.


R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.


Video Tutorial

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Mac OS X

Video Tutorial

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.


You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE.