Takehome Exercise 6

Visualising social networks in Engagement.

Yeo Kim Siang https://www.linkedin.com/in/kim-siang-yeo-b42317134/ (Singapore Management University)hhttps://scis.smu.edu.sg/master-it-business
2022-06-05

The Task

This takehome exercise aims to sharpen the skill of building data visualisation programmatically using appropriate tidyverse family of packages and the preparation of statistical graphics using ggplot2 and its extensions. The specific requirements can be found in the screenshot below.

You can find the links to then datasets here.

Exploration

Initialisation

Getting Packages

The code chunk below is used to install and load the required packages onto RStudio.

Getting Data

The code chunk below is used to load the necessary data.

We convert the timestamp for socialNetworksEdges into date for consistency and ease of use. Also, given that there are 7,482,488 rows in the dataset socialNetworksEdges, we use the data for just one month first to make the analysis easier with the code chunk below.

We add a weight column to the dataset socialNetworksEdgesMar22 with the code chunk below.

Let us export and reimport the necessary datasets to minimise the load on Git later.

Rows: 1,011
Columns: 7
$ participantId  <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
$ householdSize  <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ haveKids       <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…
$ age            <dbl> 36, 25, 35, 21, 43, 32, 26, 27, 20, 35, 48, 2…
$ educationLevel <chr> "HighSchoolOrCollege", "HighSchoolOrCollege",…
$ interestGroup  <chr> "H", "B", "A", "I", "H", "D", "I", "A", "G", …
$ joviality      <dbl> 0.001626703, 0.328086500, 0.393469590, 0.1380…
Rows: 51,650
Columns: 4
$ participantIdFrom <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ participantIdTo   <dbl> 226, 226, 226, 226, 226, 226, 226, 644, 64…
$ weekday           <chr> "Sunday", "Monday", "Tuesday", "Wednesday"…
$ weight            <dbl> 3, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, …

Let us also add 1 to all the columns that relate to a participantId. This is because pacakge tidygraph does not accept the value 0.

Now that we have the edges and nodes, we create the network graph with the code chunk below.

# A tbl_graph: 1011 nodes and 51650 edges
#
# A directed multigraph with 135 components
#
# Node Data: 1,011 × 7 (active)
  participantId householdSize haveKids   age educationLevel
          <dbl>         <dbl> <lgl>    <dbl> <chr>         
1             1             3 TRUE        36 HighSchoolOrC…
2             2             3 TRUE        25 HighSchoolOrC…
3             3             3 TRUE        35 HighSchoolOrC…
4             4             3 TRUE        21 HighSchoolOrC…
5             5             3 TRUE        43 Bachelors     
6             6             3 TRUE        32 HighSchoolOrC…
# … with 1,005 more rows, and 2 more variables: interestGroup <chr>,
#   joviality <dbl>
#
# Edge Data: 51,650 × 4
   from    to weekday weight
  <int> <int> <chr>    <dbl>
1     1   227 Sunday       3
2     1   227 Monday       3
3     1   227 Tuesday      2
# … with 51,647 more rows

Now we activate the graph with the code chunk below.

# A tbl_graph: 1011 nodes and 51650 edges
#
# A directed multigraph with 135 components
#
# Edge Data: 51,650 × 4 (active)
   from    to weekday   weight
  <int> <int> <chr>      <dbl>
1     2   845 Thursday       5
2     3   219 Thursday       5
3     6    95 Wednesday      5
4     6    95 Thursday       5
5     6    97 Thursday       5
6     7   203 Thursday       5
# … with 51,644 more rows
#
# Node Data: 1,011 × 7
  participantId householdSize haveKids   age educationLevel
          <dbl>         <dbl> <lgl>    <dbl> <chr>         
1             1             3 TRUE        36 HighSchoolOrC…
2             2             3 TRUE        25 HighSchoolOrC…
3             3             3 TRUE        35 HighSchoolOrC…
# … with 1,008 more rows, and 2 more variables: interestGroup <chr>,
#   joviality <dbl>

Visualisation

Insights

Household Size

We draw the graph for household size using the code chunk below.

In general, it does seem like smaller household interact more than larger households. In addition, the interaction between smaller households is additionally intense on Fridays.

Have Kids

We draw the graph for have kids using the code chunk below.

In general, it does seem like single individuals hang out with each other a lot more. This could be because single people have less responsibilities and tasks. This could also be because children count as single people as well, and that might skew the statistic a little.

Age

We draw the graph for age using the code chunk below.

There’s quite an even mix in terms of interaction between people of all ages.

Education Level

We draw the graph for education level using the code chunk below.

There seems to be an even mix between people of all education levels. However, graduates from high school or college tend to be the ones most likely to be less socially interactive when compared to others.

Interest Group

We draw the graph for interest group using the code chunk below.

There does not seem to be a discerning pattern for participants and their interest groups.

Joviality

We draw the graph for joviality using the code chunk below.

In general, the happier participants tend to be more socially interactive. Perhaps there could be a correlation between the 2 factors.

Conclusion

This was admittedly a very amateur attempt at building a network, and I did struggle with getting tidygraph to work. Future improvements will include learning how to manage this many datapoints (51,650) and also configuring the network graphs so that they are less cluttered.