2 December 2016 @ 12-2pm
An Application Programming Interface (API) is a tap that an internet service such as Facebook or Twitter (but also Transport for NSW) decides to install to facilitate access to its data on large scale. In order to use an API, you usually need to register and receive a set of unique credentials. The credentials allow you to interrogate the API within the limits established by the service. For example, Google Maps has an API to geolocate a string of text (let's say sydney opera house
)). The service it's free but limited to 2500 requests per day; if you need a higher limit, you pay.
install.packages('ggmap')
library(ggmap)
geocode('sydney opera house')
from which you should get
lon lat
1 -95.69871 29.99138
A for-loop is such a key concept in programming that it even has its own Wikipedia article.
Simply, a for-loop repeats a task a number of times (possibly an infinite number of times). So let's say we have a task, which we can express by a list of elementary functions. Remember, a function is a predefined set of instructions that we execute by calling it with myFunction()
; a function can optionally take inputs and result in an output. So we are sitting at our desk, next to a pile of papers, and our task is to mark all the papers. Let's think about papers
as a vector containing a series of 100 objects (the students' papers). The functions we need to do the markings are basically two
readPaper()
markPaper()
and we need to repeat it 100 times.
A for-loop allows us to to exactly that. We take our papers (one by one) from the pile of papers
and once marked we put them back in a different pile (let's called it marked_papers
). This is how you would do your marking in R with a for-loop (remember that the symbol <-
assigns the output of a function to an object):
marked_papers <- character()
for (paper in papers) {
idea_of_paper <- readPaper(paper)
marked_paper <- markPaper(idea_of_paper)
marked_papers <- c(marked_papers, marked_paper)
}
Let's try to undestand this code. First, we already have the object papers
but we need to create a new (empty) object to store our marked papers. You don't want to loose all your work! So, with the first line we create a new object character vector and we call it marked_papers
.
The actual loop starts on the second line. We declare the start of the loop with for (...) {
and we close it with }
. Everything within { ... }
will be repeated a number of times. Yes, but how many times? The instructions for our loop are (paper in papers)
, which reads: "for every single paper
in papers
run the following lines". In other words, the number of iterations of the for-loop will depend on the number of objects contained in papers
.
There is something imporant to understand here. An object paper
is created at the beginning of each iteration (containing everytime a different paper). That is, the existence of paper
is limited to the iteration. If you don't save it somewhere it will disappear. Nevertheless we are not interested in saving a copy of each paper
(which is already contained in our papers
) but only a copy of it once it has been marked (marked_paper
). But again, at every iteration the line marked_paper <- markPaper(idea_of_paper)
will replace any previously existing marked_paper
with a new one. At the end of the for-loop, after 100 iterations, there will be only one marked_paper
in memory: the last one. To store all papers we have marked we need to combine (with the function c()
) each one into our vector marked_papers
, which will be of lenght 0 at the beginning of the first iteration and of lenght 100 at the end of the last.
The code is particularly dense this time. Additionally to the for-loop (which is a controller or control-flow contruct), we use new packages, new functions and we introduce some logical operators. The code for the next workshop is here. Download the file and open it in RStudio. But I first suggest you to read this section, so to have an idea of the new packages and programming concepts introduced with the code.
The two most common classes of operators are relational operators and logical operators.
Relational operators are
x < y
, which tests if x is less than yx > y
, which tests if x is more than yx <= y
, which tests if x is less than or equal to yx >= y
, which tests if x is more than or equal to yx == y
, which tests if x is equal to y (different from =
, which doesn't test anything but assign the value on the right-side of the sign to the variable on the left-side)x != y
, which tests if x is different from yThe most common logical operators are
!
, which indicates logical negation&
, which inidates logical AND|
, which indicate logical NOTLet's start now by assigning to variable
the value 5
variable = 5
Then let's run some tests using the relational operators
variable > 10
# [1] FALSE
variable == 5
# [1] TRUE
variable >= 5
# [1] TRUE
variable != 5
# [1] FALSE
Finally let's combine them with the logical operators (remember, the value of variable
is 5
)
!(variable == 5)
# [1] FALSE
(variable == 5) & (variable+1 < 6)
[1] FALSE
but
(variable == 5) | (variable+1 < 6)
# [1] TRUE
Controllers (more precisely control-flow constructs) are fundamental in every programming language because allow the programmer to add conditions to the flow of the program. We already described the use of one controller (for
). The other popular controller is if
(which we can combine with else
).
if (variable == 2) {
doSomething()
} else {
doSomethingElse()
}