Nesting and Flattening • Radlibrary

Some of the fields returned by the ad library API are converted by Radlibrary into list columns or nested tibbles. Other fields are flattened into multiple columns.

query <- adlib_build_query(
  ad_reached_countries = "US",
  search_terms = "election",
  limit = 3,
  fields = c(
    "id",
    "publisher_platforms",
    "demographic_distribution",
    "impressions"
  )
)
response <- adlib_get(query)
data <- as_tibble(response)
head(data)
#> # A tibble: 3 × 5
#>   id      publisher_platforms impressions_lower impressions_upper
#>   <chr>   <list>                          <dbl>             <dbl>
#> 1 fake123 <chr [2]>                        1000              4999
#> 2 fake456 <chr [2]>                           0               999
#> 3 fake789 <chr [1]>                           0               999
#> # ℹ 1 more variable: demographic_distribution <list>

This query returns 5 columns. Column 1 is a regular old character vector. Column 2, publisher_platforms, is a list column. Each entry is a list of platforms on which the ad appeared. Columns 3 and 4 are regular numeric vectors that are discussed in the next section. The last column is also nested, but it’s a nested tibble rather than simple lists.

Both of these nested columns can be unnested using tidyr’s unnest.

data %>%
  select(-demographic_distribution) %>%
  unnest(publisher_platforms)
#> # A tibble: 5 × 4
#>   id      publisher_platforms impressions_lower impressions_upper
#>   <chr>   <chr>                           <dbl>             <dbl>
#> 1 fake123 facebook                         1000              4999
#> 2 fake123 messenger                        1000              4999
#> 3 fake456 facebook                            0               999
#> 4 fake456 instagram                           0               999
#> 5 fake789 facebook                            0               999

Note that this creates multiple rows for ads which appeared in multiple platforms. Caution is warranted in interpreting this dataset: this does not mean that the granularity of the other columns has increased. For instance, it’s not necessarily the case that the ad with id fake123 has over 1,000 impressions on each of Facebook and Messenger. We can only say that the sum of the impressions on this ad over each platform is between 1,000 and 4,999.

The nested tibble column can be unnested the exact same way. To avoid confusion on granularity, we’ll unselect the non-nested columns.

data %>%
  select(-publisher_platforms, -contains("impressions")) %>%
  unnest(demographic_distribution)
#> # A tibble: 13 × 4
#>    id      percentage age   gender 
#>    <chr>        <dbl> <chr> <chr>  
#>  1 fake456    0.00220 35-44 unknown
#>  2 fake456    0.00220 55-64 unknown
#>  3 fake456    0.0264  35-44 female 
#>  4 fake456    0.104   55-64 female 
#>  5 fake456    0.00440 25-34 female 
#>  6 fake456    0.0705  45-54 female 
#>  7 fake456    0.163   55-64 male   
#>  8 fake456    0.0374  25-34 male   
#>  9 fake456    0.0727  35-44 male   
#> 10 fake456    0.220   65+   male   
#> 11 fake456    0.119   45-54 male   
#> 12 fake456    0.176   65+   female 
#> 13 fake456    0.00220 65+   unknown

Another word of caution is that by default, unnesting drops rows with NULL values. Since the demographic_distribution is not available for ads fake123 or fake789, these rows are dropped from the resulting dataset. You can force this not to occur by setting keep_empty=TRUE.

data %>%
  select(-publisher_platforms, -contains("impressions")) %>%
  unnest(demographic_distribution, keep_empty = TRUE)
#> # A tibble: 15 × 4
#>    id      percentage age   gender 
#>    <chr>        <dbl> <chr> <chr>  
#>  1 fake123   NA       NA    NA     
#>  2 fake456    0.00220 35-44 unknown
#>  3 fake456    0.00220 55-64 unknown
#>  4 fake456    0.0264  35-44 female 
#>  5 fake456    0.104   55-64 female 
#>  6 fake456    0.00440 25-34 female 
#>  7 fake456    0.0705  45-54 female 
#>  8 fake456    0.163   55-64 male   
#>  9 fake456    0.0374  25-34 male   
#> 10 fake456    0.0727  35-44 male   
#> 11 fake456    0.220   65+   male   
#> 12 fake456    0.119   45-54 male   
#> 13 fake456    0.176   65+   female 
#> 14 fake456    0.00220 65+   unknown
#> 15 fake789   NA       NA    NA

Careful about combinatorial explosion with unnesting

Unnesting multiple nested columns at the same time can create an undesired combinatorial explosion. For example,

data %>%
  select(id, publisher_platforms, demographic_distribution) %>%
  unnest(publisher_platforms, keep_empty = TRUE) %>%
  unnest(demographic_distribution, keep_empty = TRUE)
#> # A tibble: 29 × 5
#>    id      publisher_platforms percentage age   gender 
#>    <chr>   <chr>                    <dbl> <chr> <chr>  
#>  1 fake123 facebook              NA       NA    NA     
#>  2 fake123 messenger             NA       NA    NA     
#>  3 fake456 facebook               0.00220 35-44 unknown
#>  4 fake456 facebook               0.00220 55-64 unknown
#>  5 fake456 facebook               0.0264  35-44 female 
#>  6 fake456 facebook               0.104   55-64 female 
#>  7 fake456 facebook               0.00440 25-34 female 
#>  8 fake456 facebook               0.0705  45-54 female 
#>  9 fake456 facebook               0.163   55-64 male   
#> 10 fake456 facebook               0.0374  25-34 male   
#> # ℹ 19 more rows

In this example, although we only have three unique ad IDs, we’ve got 29 rows. The ad fake123 shows up twice, because it has two publisher_platforms and no demographic_distribution; the ad fake456 shows up 26 times because it has two publisher platforms and 13 demographic categories; and the ad fake789 shows up once.

Documentation of all field types

The full set of available fields is documented here. In general, fields of type list<string> are converted to nested lists, while responses of type list<AudienceDistribution> are converted to nested tibbles.

Flattened columns

Some columns are returned as a list containing a min value and max value. In the official API documentation these are called fields of type InsightsRangeValue. Radlibrary will flatten these into a lower and upper column. In this example, this includes the impressions field, which is flattened to impressions_lower and impressions_upper. In general, InsightsRangeValue fields will be flattened to columns named <field name>_lower and <field name>_upper.

Don’t forget, you still have the raw data

All of the data returned by the Ads Library API is kept in the response object. If the automatic transformations that are applied by as_tibble aren’t ideal for you, you can always go into the raw data and process it however you like.

response$data
#> [[1]]
#> [[1]]$id
#> [1] "fake123"
#> 
#> [[1]]$publisher_platforms
#> [[1]]$publisher_platforms[[1]]
#> [1] "facebook"
#> 
#> [[1]]$publisher_platforms[[2]]
#> [1] "messenger"
#> 
#> 
#> [[1]]$impressions
#> [[1]]$impressions$lower_bound
#> [1] "1000"
#> 
#> [[1]]$impressions$upper_bound
#> [1] "4999"
#> 
#> 
#> 
#> [[2]]
#> [[2]]$id
#> [1] "fake456"
#> 
#> [[2]]$publisher_platforms
#> [[2]]$publisher_platforms[[1]]
#> [1] "facebook"
#> 
#> [[2]]$publisher_platforms[[2]]
#> [1] "instagram"
#> 
#> 
#> [[2]]$demographic_distribution
#> [[2]]$demographic_distribution[[1]]
#> [[2]]$demographic_distribution[[1]]$percentage
#> [1] "0.002203"
#> 
#> [[2]]$demographic_distribution[[1]]$age
#> [1] "35-44"
#> 
#> [[2]]$demographic_distribution[[1]]$gender
#> [1] "unknown"
#> 
#> 
#> [[2]]$demographic_distribution[[2]]
#> [[2]]$demographic_distribution[[2]]$percentage
#> [1] "0.002203"
#> 
#> [[2]]$demographic_distribution[[2]]$age
#> [1] "55-64"
#> 
#> [[2]]$demographic_distribution[[2]]$gender
#> [1] "unknown"
#> 
#> 
#> [[2]]$demographic_distribution[[3]]
#> [[2]]$demographic_distribution[[3]]$percentage
#> [1] "0.026432"
#> 
#> [[2]]$demographic_distribution[[3]]$age
#> [1] "35-44"
#> 
#> [[2]]$demographic_distribution[[3]]$gender
#> [1] "female"
#> 
#> 
#> [[2]]$demographic_distribution[[4]]
#> [[2]]$demographic_distribution[[4]]$percentage
#> [1] "0.103524"
#> 
#> [[2]]$demographic_distribution[[4]]$age
#> [1] "55-64"
#> 
#> [[2]]$demographic_distribution[[4]]$gender
#> [1] "female"
#> 
#> 
#> [[2]]$demographic_distribution[[5]]
#> [[2]]$demographic_distribution[[5]]$percentage
#> [1] "0.004405"
#> 
#> [[2]]$demographic_distribution[[5]]$age
#> [1] "25-34"
#> 
#> [[2]]$demographic_distribution[[5]]$gender
#> [1] "female"
#> 
#> 
#> [[2]]$demographic_distribution[[6]]
#> [[2]]$demographic_distribution[[6]]$percentage
#> [1] "0.070485"
#> 
#> [[2]]$demographic_distribution[[6]]$age
#> [1] "45-54"
#> 
#> [[2]]$demographic_distribution[[6]]$gender
#> [1] "female"
#> 
#> 
#> [[2]]$demographic_distribution[[7]]
#> [[2]]$demographic_distribution[[7]]$percentage
#> [1] "0.162996"
#> 
#> [[2]]$demographic_distribution[[7]]$age
#> [1] "55-64"
#> 
#> [[2]]$demographic_distribution[[7]]$gender
#> [1] "male"
#> 
#> 
#> [[2]]$demographic_distribution[[8]]
#> [[2]]$demographic_distribution[[8]]$percentage
#> [1] "0.037445"
#> 
#> [[2]]$demographic_distribution[[8]]$age
#> [1] "25-34"
#> 
#> [[2]]$demographic_distribution[[8]]$gender
#> [1] "male"
#> 
#> 
#> [[2]]$demographic_distribution[[9]]
#> [[2]]$demographic_distribution[[9]]$percentage
#> [1] "0.072687"
#> 
#> [[2]]$demographic_distribution[[9]]$age
#> [1] "35-44"
#> 
#> [[2]]$demographic_distribution[[9]]$gender
#> [1] "male"
#> 
#> 
#> [[2]]$demographic_distribution[[10]]
#> [[2]]$demographic_distribution[[10]]$percentage
#> [1] "0.220264"
#> 
#> [[2]]$demographic_distribution[[10]]$age
#> [1] "65+"
#> 
#> [[2]]$demographic_distribution[[10]]$gender
#> [1] "male"
#> 
#> 
#> [[2]]$demographic_distribution[[11]]
#> [[2]]$demographic_distribution[[11]]$percentage
#> [1] "0.118943"
#> 
#> [[2]]$demographic_distribution[[11]]$age
#> [1] "45-54"
#> 
#> [[2]]$demographic_distribution[[11]]$gender
#> [1] "male"
#> 
#> 
#> [[2]]$demographic_distribution[[12]]
#> [[2]]$demographic_distribution[[12]]$percentage
#> [1] "0.176211"
#> 
#> [[2]]$demographic_distribution[[12]]$age
#> [1] "65+"
#> 
#> [[2]]$demographic_distribution[[12]]$gender
#> [1] "female"
#> 
#> 
#> [[2]]$demographic_distribution[[13]]
#> [[2]]$demographic_distribution[[13]]$percentage
#> [1] "0.002203"
#> 
#> [[2]]$demographic_distribution[[13]]$age
#> [1] "65+"
#> 
#> [[2]]$demographic_distribution[[13]]$gender
#> [1] "unknown"
#> 
#> 
#> 
#> [[2]]$impressions
#> [[2]]$impressions$lower_bound
#> [1] "0"
#> 
#> [[2]]$impressions$upper_bound
#> [1] "999"
#> 
#> 
#> 
#> [[3]]
#> [[3]]$id
#> [1] "fake789"
#> 
#> [[3]]$publisher_platforms
#> [[3]]$publisher_platforms[[1]]
#> [1] "facebook"
#> 
#> 
#> [[3]]$impressions
#> [[3]]$impressions$lower_bound
#> [1] "0"
#> 
#> [[3]]$impressions$upper_bound
#> [1] "999"