Data Organization, Cleaning, Analysis and Visualization



July 8-10,13,15, & 17th, 2020

2:00 pm - 4:00pm only on July 8th, 9:00 am - 11:00 am for the rest

Instructors: Eve Bohnett, Lorraine Ling, Rachel Lombardi, Sichong Peng

Helpers: Altaf Kassam

General Information

Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Its target audience is researchers who have little to no prior computational experience, and its lessons are domain specific, building on learners' existing knowledge to enable them to quickly apply skills learned to their own research. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

Week 1

Pre-workshop survey & Setup
July 8 Intro to Carpentries
Spreadsheets/Open Refine
July 9 Intro to R
July 10 Starting with Data

Week 2

July 13 Manipulating Data
July 15 Data Visualisation
July 17 TBD, audience's choice
More Dataframe Manipulation with dplyr
More plotting with ggplot2
Functions Explained
To participate in a Data Carpentry workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

The setup instructions for the Data Carpentry Ecology workshops (with R) can be found at the workshop overview site.


During the workshop, we will use Slack as a place for learners to ask questions, get help, and participate in polls. You will receive an invitation to join the Slack workplace via Instructor, Eve Bohnett. If you have not yet received this email, please contact workshop host: Altaf Kassam

Spreadsheet Program

To interact with spreadsheets, we can use LibreOffice, Microsoft Excel, Gnumeric,, or other programs. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same. For this workshop, we recommend using either Microsoft Excel (paid software) or LibreOffice (free and open source). Other spreadsheet programs may not have all of the features we will be exploring in this workshop.

To install LibreOffice, go to their download page . The website should automatically select the correct option for your operating system. Click the "Download" button. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically. Once the installer is downloaded, double click on it (you may need to open your Downloads folder) and LibreOffice should install.

Open Refine

OpenRefine is a Java program that runs on your local machine (not on the cloud). Although it displays in your browser, no web connection is needed and your data remains local. You need to have a ‘Java Runtime Environment’ (JRE) installed on your computer to run OpenRefine. If you don’t already have one installed then you can download and install from by going to the site and clicking “Free Java Download”.

To install OpenRefine, go to their download page . From the download page, select either "Windows kit", "Mac kit", or "Linux kit" - depending on your operating system - and follow the instructions next to your download link. This lesson has been tested with all versions of OpenRefine up to the latest tested version, 3.2. **If you are using an older version, it is recommended you upgrade to the latest tested version.** After installing, you can delete the installer `.dmg` file.

You may get an error message: " can't be opened because it is from an unidentified developer." If you get this message, open your system preferences and click "Security & Privacy". You will see a message " was blocked from opening because it is from an unidentified developer." Click "Open Anyway" and "Yes". OpenRefine should open in your default web browser.

OpenRefine does not support Internet Explorer or Edge. Please use Firefox, Chrome or Safari instead.


R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.


Video Tutorial

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Mac OS X

Video Tutorial

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.


You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE.