I try to identify key traits within the Participant data, and also identify relationships between the factors.
In this takehome exercise, I apply the skills I have learned in Lesson 1 and Hands-on Exercise 1 to reveal the demographic of the city of Engagement, Ohio USA by using appropriate static statistical graphics methods.
Before we get started, I download tidyverse with the code chunk below
packages = c('tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below imports Participants.csv from the data
sub-folder into R by usingread_csv() of readr,
and saves it as a tibble dataframe called participant_data.
participant_data <- read_csv("data/Participants.csv")
I’d like to start by understanding the variables householdSize, haveKids, age, educationLevel, interestGroup and joviality. We leave participantId out because it provides no additional information and is mainly used as an identifier. To do that, I create a barplot for each one, except for joviality. That’s because the first few factors tend to be discrete, while the last is more continuous.
householdSizeIt’s nice to know that we have a lower limit of 1 and an upper limit of 3 members, in terms of household size.
haveKidsLess people have kids.
ageThere doesn’t seem to be a clear pattern here. Perhaps I could rearrange the data here in ascending order to have a better gauge. Or I could peg it against another variable to support a hypothesis.
educationLevelIt’s interesting to see that the bulk of of people are went to HighSchoolOrCollege which is quite confusing since College and Bachelors mean that same thing.
interestGroupAgain not the most helpful. I’ll definitely have to peg it against another variable to give this variable more meaning.
jovialityIt’s nice to know that most people are happier than 0.45.
I think the most interesting thing to understand here is what determines whether or not one has kids. However, given my personal time constraint, I will leave this to further exploration in future exercises. More specifically, I will like to find the correlation between haveKids and educationLevel, interestGroup and joviality.