Skip to Main Content

R Studio Basics for Data Visualization: Make the data usable

For the R Studio Data Visualizaton Workshops, November 2018

What does usable data mean?

"Make the data usable" could mean different things depending on the characteristics of the dataset you have uploaded.

Here are some things to consider:

  • Do I need to filter the data in some way so that trends can become visible? Examples: just taking the top 10 popular dog names, just looking at adults 18 and over, filtering to only minors 18 and younger, etc.
  • Do I need to rename or explain part of the data to ggplot for it to display correctly? Example: Showing the numbers 1 or 2 isn't helpful for a display key. That is how the data shows male or female. I would have to tell that 1 = Male and 2 = Female.
  • Is the data "countable" for ggplot to do a stat count? Example: if the counts are already there for a variable, geom_bar has nothing to count!

 

Basic filter code

Because our dataset might be too big or contain things we don't want to include in our visualizations, we will want to filter it.

Here is the basic code structure to make a filter:

({namefilter} <- filter ({data}, {variablename} == {quality or quantity}))

Example: 

(TopDogs <- filter(Dog_Names, Count >53))

This will filter the data to only the top dog names because the count has to be more than 53 to be included in this filtered dataset.

We could make all different sorts of filters with the Dog Name data. We could use any of the following in our filter:

== equals

> greater than

< less than

>= greater than or equal to

<= less than or equal to

 

Wrangling data like this uses the dplyr package. "Filter" is just one of many functions in the dyplr package. See the Resources tab for sources that will explain other functions that will help make your data usable.