Some of the fields returned by the ad library API are converted by Radlibrary into list columns or nested tibbles. Other fields are flattened into multiple columns.
query <- adlib_build_query(
ad_reached_countries = "US",
search_terms = "election",
limit = 3,
fields = c(
"id",
"publisher_platforms",
"demographic_distribution",
"impressions"
)
)
response <- adlib_get(query)
data <- as_tibble(response)
head(data)
#> # A tibble: 3 × 5
#> id publisher_platforms impressions_lower impressions_upper
#> <chr> <list> <dbl> <dbl>
#> 1 fake123 <chr [2]> 1000 4999
#> 2 fake456 <chr [2]> 0 999
#> 3 fake789 <chr [1]> 0 999
#> # ℹ 1 more variable: demographic_distribution <list>
This query returns 5 columns. Column 1 is a regular old character
vector. Column 2, publisher_platforms
, is a list column.
Each entry is a list of platforms on which the ad appeared. Columns 3
and 4 are regular numeric vectors that are discussed in the next
section. The last column is also nested, but it’s a nested
tibble
rather than simple lists.
Both of these nested columns can be unnested using
tidyr
’s unnest
.
data %>%
select(-demographic_distribution) %>%
unnest(publisher_platforms)
#> # A tibble: 5 × 4
#> id publisher_platforms impressions_lower impressions_upper
#> <chr> <chr> <dbl> <dbl>
#> 1 fake123 facebook 1000 4999
#> 2 fake123 messenger 1000 4999
#> 3 fake456 facebook 0 999
#> 4 fake456 instagram 0 999
#> 5 fake789 facebook 0 999
Note that this creates multiple rows for ads which appeared in
multiple platforms. Caution is warranted in interpreting this dataset:
this does not mean that the granularity of the other columns has
increased. For instance, it’s not necessarily the case that the ad with
id fake123
has over 1,000 impressions on each of
Facebook and Messenger. We can only say that the sum of the impressions
on this ad over each platform is between 1,000 and 4,999.
The nested tibble
column can be unnested the exact same
way. To avoid confusion on granularity, we’ll unselect the non-nested
columns.
data %>%
select(-publisher_platforms, -contains("impressions")) %>%
unnest(demographic_distribution)
#> # A tibble: 13 × 4
#> id percentage age gender
#> <chr> <dbl> <chr> <chr>
#> 1 fake456 0.00220 35-44 unknown
#> 2 fake456 0.00220 55-64 unknown
#> 3 fake456 0.0264 35-44 female
#> 4 fake456 0.104 55-64 female
#> 5 fake456 0.00440 25-34 female
#> 6 fake456 0.0705 45-54 female
#> 7 fake456 0.163 55-64 male
#> 8 fake456 0.0374 25-34 male
#> 9 fake456 0.0727 35-44 male
#> 10 fake456 0.220 65+ male
#> 11 fake456 0.119 45-54 male
#> 12 fake456 0.176 65+ female
#> 13 fake456 0.00220 65+ unknown
Another word of caution is that by default, unnesting drops rows with
NULL values. Since the demographic_distribution
is not
available for ads fake123
or fake789
, these
rows are dropped from the resulting dataset. You can force this not to
occur by setting keep_empty=TRUE
.
data %>%
select(-publisher_platforms, -contains("impressions")) %>%
unnest(demographic_distribution, keep_empty = TRUE)
#> # A tibble: 15 × 4
#> id percentage age gender
#> <chr> <dbl> <chr> <chr>
#> 1 fake123 NA NA NA
#> 2 fake456 0.00220 35-44 unknown
#> 3 fake456 0.00220 55-64 unknown
#> 4 fake456 0.0264 35-44 female
#> 5 fake456 0.104 55-64 female
#> 6 fake456 0.00440 25-34 female
#> 7 fake456 0.0705 45-54 female
#> 8 fake456 0.163 55-64 male
#> 9 fake456 0.0374 25-34 male
#> 10 fake456 0.0727 35-44 male
#> 11 fake456 0.220 65+ male
#> 12 fake456 0.119 45-54 male
#> 13 fake456 0.176 65+ female
#> 14 fake456 0.00220 65+ unknown
#> 15 fake789 NA NA NA
Careful about combinatorial explosion with unnesting
Unnesting multiple nested columns at the same time can create an undesired combinatorial explosion. For example,
data %>%
select(id, publisher_platforms, demographic_distribution) %>%
unnest(publisher_platforms, keep_empty = TRUE) %>%
unnest(demographic_distribution, keep_empty = TRUE)
#> # A tibble: 29 × 5
#> id publisher_platforms percentage age gender
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 fake123 facebook NA NA NA
#> 2 fake123 messenger NA NA NA
#> 3 fake456 facebook 0.00220 35-44 unknown
#> 4 fake456 facebook 0.00220 55-64 unknown
#> 5 fake456 facebook 0.0264 35-44 female
#> 6 fake456 facebook 0.104 55-64 female
#> 7 fake456 facebook 0.00440 25-34 female
#> 8 fake456 facebook 0.0705 45-54 female
#> 9 fake456 facebook 0.163 55-64 male
#> 10 fake456 facebook 0.0374 25-34 male
#> # ℹ 19 more rows
In this example, although we only have three unique ad IDs, we’ve got
29 rows. The ad fake123
shows up twice, because it has two
publisher_platforms
and no
demographic_distribution
; the ad fake456
shows
up 26 times because it has two publisher platforms and 13 demographic
categories; and the ad fake789
shows up once.
Documentation of all field types
The full set of available fields is documented here.
In general, fields of type list<string>
are converted
to nested lists, while responses of type
list<AudienceDistribution>
are converted to nested
tibbles.
Flattened columns
Some columns are returned as a list containing a min value and max
value. In the official
API documentation these are called fields of type
InsightsRangeValue
. Radlibrary
will flatten
these into a lower
and upper
column. In this
example, this includes the impressions
field, which is
flattened to impressions_lower
and
impressions_upper
. In general,
InsightsRangeValue
fields will be flattened to columns
named <field name>_lower
and
<field name>_upper
.
Don’t forget, you still have the raw data
All of the data returned by the Ads Library API is kept in the
response object. If the automatic transformations that are applied by
as_tibble
aren’t ideal for you, you can always go into the
raw data and process it however you like.
response$data
#> [[1]]
#> [[1]]$id
#> [1] "fake123"
#>
#> [[1]]$publisher_platforms
#> [[1]]$publisher_platforms[[1]]
#> [1] "facebook"
#>
#> [[1]]$publisher_platforms[[2]]
#> [1] "messenger"
#>
#>
#> [[1]]$impressions
#> [[1]]$impressions$lower_bound
#> [1] "1000"
#>
#> [[1]]$impressions$upper_bound
#> [1] "4999"
#>
#>
#>
#> [[2]]
#> [[2]]$id
#> [1] "fake456"
#>
#> [[2]]$publisher_platforms
#> [[2]]$publisher_platforms[[1]]
#> [1] "facebook"
#>
#> [[2]]$publisher_platforms[[2]]
#> [1] "instagram"
#>
#>
#> [[2]]$demographic_distribution
#> [[2]]$demographic_distribution[[1]]
#> [[2]]$demographic_distribution[[1]]$percentage
#> [1] "0.002203"
#>
#> [[2]]$demographic_distribution[[1]]$age
#> [1] "35-44"
#>
#> [[2]]$demographic_distribution[[1]]$gender
#> [1] "unknown"
#>
#>
#> [[2]]$demographic_distribution[[2]]
#> [[2]]$demographic_distribution[[2]]$percentage
#> [1] "0.002203"
#>
#> [[2]]$demographic_distribution[[2]]$age
#> [1] "55-64"
#>
#> [[2]]$demographic_distribution[[2]]$gender
#> [1] "unknown"
#>
#>
#> [[2]]$demographic_distribution[[3]]
#> [[2]]$demographic_distribution[[3]]$percentage
#> [1] "0.026432"
#>
#> [[2]]$demographic_distribution[[3]]$age
#> [1] "35-44"
#>
#> [[2]]$demographic_distribution[[3]]$gender
#> [1] "female"
#>
#>
#> [[2]]$demographic_distribution[[4]]
#> [[2]]$demographic_distribution[[4]]$percentage
#> [1] "0.103524"
#>
#> [[2]]$demographic_distribution[[4]]$age
#> [1] "55-64"
#>
#> [[2]]$demographic_distribution[[4]]$gender
#> [1] "female"
#>
#>
#> [[2]]$demographic_distribution[[5]]
#> [[2]]$demographic_distribution[[5]]$percentage
#> [1] "0.004405"
#>
#> [[2]]$demographic_distribution[[5]]$age
#> [1] "25-34"
#>
#> [[2]]$demographic_distribution[[5]]$gender
#> [1] "female"
#>
#>
#> [[2]]$demographic_distribution[[6]]
#> [[2]]$demographic_distribution[[6]]$percentage
#> [1] "0.070485"
#>
#> [[2]]$demographic_distribution[[6]]$age
#> [1] "45-54"
#>
#> [[2]]$demographic_distribution[[6]]$gender
#> [1] "female"
#>
#>
#> [[2]]$demographic_distribution[[7]]
#> [[2]]$demographic_distribution[[7]]$percentage
#> [1] "0.162996"
#>
#> [[2]]$demographic_distribution[[7]]$age
#> [1] "55-64"
#>
#> [[2]]$demographic_distribution[[7]]$gender
#> [1] "male"
#>
#>
#> [[2]]$demographic_distribution[[8]]
#> [[2]]$demographic_distribution[[8]]$percentage
#> [1] "0.037445"
#>
#> [[2]]$demographic_distribution[[8]]$age
#> [1] "25-34"
#>
#> [[2]]$demographic_distribution[[8]]$gender
#> [1] "male"
#>
#>
#> [[2]]$demographic_distribution[[9]]
#> [[2]]$demographic_distribution[[9]]$percentage
#> [1] "0.072687"
#>
#> [[2]]$demographic_distribution[[9]]$age
#> [1] "35-44"
#>
#> [[2]]$demographic_distribution[[9]]$gender
#> [1] "male"
#>
#>
#> [[2]]$demographic_distribution[[10]]
#> [[2]]$demographic_distribution[[10]]$percentage
#> [1] "0.220264"
#>
#> [[2]]$demographic_distribution[[10]]$age
#> [1] "65+"
#>
#> [[2]]$demographic_distribution[[10]]$gender
#> [1] "male"
#>
#>
#> [[2]]$demographic_distribution[[11]]
#> [[2]]$demographic_distribution[[11]]$percentage
#> [1] "0.118943"
#>
#> [[2]]$demographic_distribution[[11]]$age
#> [1] "45-54"
#>
#> [[2]]$demographic_distribution[[11]]$gender
#> [1] "male"
#>
#>
#> [[2]]$demographic_distribution[[12]]
#> [[2]]$demographic_distribution[[12]]$percentage
#> [1] "0.176211"
#>
#> [[2]]$demographic_distribution[[12]]$age
#> [1] "65+"
#>
#> [[2]]$demographic_distribution[[12]]$gender
#> [1] "female"
#>
#>
#> [[2]]$demographic_distribution[[13]]
#> [[2]]$demographic_distribution[[13]]$percentage
#> [1] "0.002203"
#>
#> [[2]]$demographic_distribution[[13]]$age
#> [1] "65+"
#>
#> [[2]]$demographic_distribution[[13]]$gender
#> [1] "unknown"
#>
#>
#>
#> [[2]]$impressions
#> [[2]]$impressions$lower_bound
#> [1] "0"
#>
#> [[2]]$impressions$upper_bound
#> [1] "999"
#>
#>
#>
#> [[3]]
#> [[3]]$id
#> [1] "fake789"
#>
#> [[3]]$publisher_platforms
#> [[3]]$publisher_platforms[[1]]
#> [1] "facebook"
#>
#>
#> [[3]]$impressions
#> [[3]]$impressions$lower_bound
#> [1] "0"
#>
#> [[3]]$impressions$upper_bound
#> [1] "999"