Chapter 2 First steps
Chapter progress bar
███████████████████████████░░░ 90%
2.1 Packages
The R code relies on generalist packages to access the API and manipulate the response.
library("httr")
library("dplyr")
library("jsonlite")
If you don’t have these packages already install, you need to run
install.packages("httr")
install.packages("jsonlite")
install.packages("dplyr")
2.2 Credentials
Let’s store our bearer token in an environment variable (let’s call it BEARER_TOKEN
)
Sys.setenv(BEARER_TOKEN = "copy-your-bearer-token-here")
We are then able to get the token back with
Sys.getenv("BEARER_TOKEN")
The idea is to run Sys.setenv()
from our console before running our scripts (that is, every time!) so that our token is never added to a script file. Of course, if you don’t care you can just store it in a regular variable.
2.3 Interrogating the Twitter API
The Twitter API accept two methods to exchange information: POST and GET. Intuitively, with the POST method we send information to a server while with the GET method we retrieve information. With the Twitter API, the GET method is used more frequently. Still, we need to use the POST method to define our search rules before we GET the Filtered stream.
This is how a GET request using the httr package looks like:
::GET(url,
httr::add_headers(.headers = headers),
httrquery = params)
The url
is a simple character variable while headers
and params
are lists.
But let’s send a GET request!
We need first to set the URL, specify our request headers (these are not going to change, so you can place at the top of your document) and set the parameters fo the query.
<-
url "https://api.twitter.com/2/tweets/counts/recent"
<-
headers c(`Authorization` = sprintf('Bearer %s',
Sys.getenv("BEARER_TOKEN")))
<-
params list(query = "from:TwitterDev",
granularity = "day")
What are we doing here?
With
url
we specify the endpoint we want to use for this API request. The Twitter API has several endpoints. Note that sometimes we need to include parameters here instead of passing them through the HTTP query.headers
is the first layer of information that we send over to the server. In this case it contains our token. If this is accepted - the status of the request is200 OK
- then the API is ready to process our request. If the token is not accepted we get as status401 Unauthorized
. Note that these error codes and messages define the status of the HTTP request. The Twitter API has a different set of error codes. In this sense, we can get a200 OK
from the HTTP layer and still get an error (e.g.429 Too Many Requests
) from the API layer (think in stacks!).With
params
we define the queries with want to append to the URL. Functionally, you can imagine that the list of key-value pairs what we define in list objectparams
are appended after the string we set withurl
and a?
(for example, inhttps://example.com/over/there?name=ferret
the query is defined by the key-valuename=ferret
).
Now we can add these as attributes to the function GET
and collect the response in res
.
<-
res ::GET(url,
httr::add_headers(.headers = headers),
httrquery = params)
By printing res
we see details about the HTTP response (but not yet the API response or the content returned from the API).
print(res)
## Response [https://api.twitter.com/2/tweets/counts/recent?query=from%3ATwitterDev&granularity=day]
## Date: 2022-03-28 23:40
## Status: 200
## Content-Type: application/json; charset=utf-8
## Size: 729 B
If our request was authorised we should get
Status: 200
if our request was not authorised (likely because your token was not correctly specified) we should instead get
Status: 401
Assuming, that we got an OK from the HTTP layer, then we can access the content we receive as a response from the API layer. We access it with the function httr::content()
.
<-
obj.json ::content(res, as = "text") httr
Now by default the Twitter API responses are in JSON format, which looks like this:
print(jsonlite::prettify(obj.json, indent = 4))
## {
## "data": [
## {
## "end": "2022-03-22T00:00:00.000Z",
## "start": "2022-03-21T23:40:12.000Z",
## "tweet_count": 0
## },
## {
## "end": "2022-03-23T00:00:00.000Z",
## "start": "2022-03-22T00:00:00.000Z",
## "tweet_count": 0
## },
## {
## "end": "2022-03-24T00:00:00.000Z",
## "start": "2022-03-23T00:00:00.000Z",
## "tweet_count": 2
## },
## {
## "end": "2022-03-25T00:00:00.000Z",
## "start": "2022-03-24T00:00:00.000Z",
## "tweet_count": 1
## },
## {
## "end": "2022-03-26T00:00:00.000Z",
## "start": "2022-03-25T00:00:00.000Z",
## "tweet_count": 0
## },
## {
## "end": "2022-03-27T00:00:00.000Z",
## "start": "2022-03-26T00:00:00.000Z",
## "tweet_count": 0
## },
## {
## "end": "2022-03-28T00:00:00.000Z",
## "start": "2022-03-27T00:00:00.000Z",
## "tweet_count": 0
## },
## {
## "end": "2022-03-28T23:40:12.000Z",
## "start": "2022-03-28T00:00:00.000Z",
## "tweet_count": 1
## }
## ],
## "meta": {
## "total_tweet_count": 4
## }
## }
##
We can use the jsonlite package to translate the JSON-formatted string into an R object with
<-
obj.r ::fromJSON(obj.json) jsonlite
print(obj.r)
## $data
## end start tweet_count
## 1 2022-03-22T00:00:00.000Z 2022-03-21T23:40:12.000Z 0
## 2 2022-03-23T00:00:00.000Z 2022-03-22T00:00:00.000Z 0
## 3 2022-03-24T00:00:00.000Z 2022-03-23T00:00:00.000Z 2
## 4 2022-03-25T00:00:00.000Z 2022-03-24T00:00:00.000Z 1
## 5 2022-03-26T00:00:00.000Z 2022-03-25T00:00:00.000Z 0
## 6 2022-03-27T00:00:00.000Z 2022-03-26T00:00:00.000Z 0
## 7 2022-03-28T00:00:00.000Z 2022-03-27T00:00:00.000Z 0
## 8 2022-03-28T23:40:12.000Z 2022-03-28T00:00:00.000Z 1
##
## $meta
## $meta$total_tweet_count
## [1] 4
And this is information on the number of tweets posted by @TwitterDev in the days before our request.
2.4 Twitter API v1.1 and Twitter API v2
Currently both the v1.1 and v2 version of the Twitter API are online and accepting requests. Still, not all endpoints available for the v1.1 are also already implemented in the v2. So we will need to use both endpoints. The main issue with that is that how API errors are returned in two different formats. The best way to see how errors are returned is… to trigger so error!
2.4.1 API v1.1 errrors
Let’s get an API error first by requesting trends
for a place
that doesn’t exist.
<-
url "https://api.twitter.com/1.1/trends/place.json"
<-
params list(id = "THIS_ID_DOESNT_EXIST")
<-
res ::GET(url,
httr::add_headers(.headers = headers),
httrquery = params)
<-
obj.json ::content(res, as = "text")
httr
<-
obj.r ::fromJSON(obj.json) jsonlite
While using the API v1.1., you should expect to deal with such structure when you hit an error
print(str(obj.r))
## List of 1
## $ errors:'data.frame': 1 obs. of 2 variables:
## ..$ code : int 34
## ..$ message: chr "Sorry, that page does not exist."
## NULL
obj.r
is list containing a single data.frame in a list’s item named errors
. To check if the result contains an error we can do "errors" %in% names(obj.r)
which will return TRUE
if we hit an API v1.1 error and FALSE
if the API returned the information we requested. Let’s try it out:
"errors" %in% names(obj.r)
## [1] TRUE
but
<-
params list(id = "1") # This ID instead exists
<-
res ::GET(url,
httr::add_headers(.headers = headers),
httrquery = params)
<-
obj.json ::content(res, as = "text")
httr
<-
obj.r ::fromJSON(obj.json)
jsonlite
"errors" %in% names(obj.r)
## [1] FALSE
2.4.2 API v2 errrors
We can then trigger an error from the v2 API with the following code. The structure of the response object is going to be different.
<-
url "https://api.twitter.com/2/tweets/counts/recent"
<-
params list(squery = "The parameter's name is mispelled")
<-
res ::GET(url,
httr::add_headers(.headers = headers),
httrquery = params)
<-
obj.json ::content(res, as = "text")
httr
<-
obj.r ::fromJSON(obj.json) jsonlite
print(str(obj.r))
## List of 4
## $ errors:'data.frame': 2 obs. of 2 variables:
## ..$ parameters:'data.frame': 2 obs. of 2 variables:
## .. ..$ query :List of 2
## .. .. ..$ : list()
## .. .. ..$ : NULL
## .. ..$ squery:List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr "The parameter's name is mispelled"
## ..$ message : chr [1:2] "The `query` query parameter can not be empty" "The query parameter [squery] is not one of [query,start_time,end_time,since_id,until_id,next_token,pagination_t"| __truncated__
## $ title : chr "Invalid Request"
## $ detail: chr "One or more parameters to your request was invalid."
## $ type : chr "https://api.twitter.com/2/problems/invalid-request"
## NULL
but likely we can still check if our response generated an error with
"errors" %in% names(obj.r)
## [1] TRUE