Visualising social networks in Engagement.
This takehome exercise aims to sharpen the skill of building data visualisation programmatically using appropriate tidyverse family of packages and the preparation of statistical graphics using ggplot2 and its extensions. The specific requirements can be found in the screenshot below.
You can find the links to then datasets here.
The code chunk below is used to install and load the required packages onto RStudio.
The code chunk below is used to load the necessary data.
We convert the timestamp for socialNetworksEdges into date for consistency and ease of use. Also, given that there are 7,482,488 rows in the dataset socialNetworksEdges, we use the data for just one month first to make the analysis easier with the code chunk below.
We add a weight column to the dataset socialNetworksEdgesMar22 with the code chunk below.
Let us export and reimport the necessary datasets to minimise the load on Git later.
Rows: 1,011
Columns: 7
$ participantId <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
$ householdSize <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ haveKids <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…
$ age <dbl> 36, 25, 35, 21, 43, 32, 26, 27, 20, 35, 48, 2…
$ educationLevel <chr> "HighSchoolOrCollege", "HighSchoolOrCollege",…
$ interestGroup <chr> "H", "B", "A", "I", "H", "D", "I", "A", "G", …
$ joviality <dbl> 0.001626703, 0.328086500, 0.393469590, 0.1380…
Rows: 51,650
Columns: 4
$ participantIdFrom <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ participantIdTo <dbl> 226, 226, 226, 226, 226, 226, 226, 644, 64…
$ weekday <chr> "Sunday", "Monday", "Tuesday", "Wednesday"…
$ weight <dbl> 3, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, …
Let us also add 1 to all the columns that relate to a participantId. This is because pacakge tidygraph does not accept the value 0.
Now that we have the edges and nodes, we create the network graph with the code chunk below.
# A tbl_graph: 1011 nodes and 51650 edges
#
# A directed multigraph with 135 components
#
# Node Data: 1,011 × 7 (active)
participantId householdSize haveKids age educationLevel
<dbl> <dbl> <lgl> <dbl> <chr>
1 1 3 TRUE 36 HighSchoolOrC…
2 2 3 TRUE 25 HighSchoolOrC…
3 3 3 TRUE 35 HighSchoolOrC…
4 4 3 TRUE 21 HighSchoolOrC…
5 5 3 TRUE 43 Bachelors
6 6 3 TRUE 32 HighSchoolOrC…
# … with 1,005 more rows, and 2 more variables: interestGroup <chr>,
# joviality <dbl>
#
# Edge Data: 51,650 × 4
from to weekday weight
<int> <int> <chr> <dbl>
1 1 227 Sunday 3
2 1 227 Monday 3
3 1 227 Tuesday 2
# … with 51,647 more rows
Now we activate the graph with the code chunk below.
# A tbl_graph: 1011 nodes and 51650 edges
#
# A directed multigraph with 135 components
#
# Edge Data: 51,650 × 4 (active)
from to weekday weight
<int> <int> <chr> <dbl>
1 2 845 Thursday 5
2 3 219 Thursday 5
3 6 95 Wednesday 5
4 6 95 Thursday 5
5 6 97 Thursday 5
6 7 203 Thursday 5
# … with 51,644 more rows
#
# Node Data: 1,011 × 7
participantId householdSize haveKids age educationLevel
<dbl> <dbl> <lgl> <dbl> <chr>
1 1 3 TRUE 36 HighSchoolOrC…
2 2 3 TRUE 25 HighSchoolOrC…
3 3 3 TRUE 35 HighSchoolOrC…
# … with 1,008 more rows, and 2 more variables: interestGroup <chr>,
# joviality <dbl>
We draw the graph for household size using the code chunk below.
In general, it does seem like smaller household interact more than larger households. In addition, the interaction between smaller households is additionally intense on Fridays.
We draw the graph for have kids using the code chunk below.
In general, it does seem like single individuals hang out with each other a lot more. This could be because single people have less responsibilities and tasks. This could also be because children count as single people as well, and that might skew the statistic a little.
We draw the graph for age using the code chunk below.
There’s quite an even mix in terms of interaction between people of all ages.
We draw the graph for education level using the code chunk below.
There seems to be an even mix between people of all education levels. However, graduates from high school or college tend to be the ones most likely to be less socially interactive when compared to others.
We draw the graph for interest group using the code chunk below.
There does not seem to be a discerning pattern for participants and their interest groups.
We draw the graph for joviality using the code chunk below.
In general, the happier participants tend to be more socially interactive. Perhaps there could be a correlation between the 2 factors.
This was admittedly a very amateur attempt at building a network, and I did struggle with getting tidygraph to work. Future improvements will include learning how to manage this many datapoints (51,650) and also configuring the network graphs so that they are less cluttered.