This is a tutorial for the package reticulate. Reticulate offers the ability to run Python code directly from R. It is a powerful package which can translate data between R and Python to allow for almost seamlessly integration between the two languages. While this is not the first package of this type, the ease of use and available features make it very useful.
What you can do with this package includes but not limited to: Calling Python in R Translation between R and Python objects Using different versions of Python by creating virtual environments
Reticulate does this by embedding a Python session in R.
To get started, install the reticulate package:
pti <- c("reticulate")
pti <- pti[!(pti %in% installed.packages())]
if(length(pti)>0){
install.packages(pti)
}
library(reticulate)
Set the path to the correct version of Python using the use_python() function. While not strictly required, explicitly choosing a Python instance is a best practice. Once an instance is chosen for that session it cannot be changed.
Instead of specifying the location of Python, we can use repl_python
function too.
#use_python("/Users/egcanmac/opt/anaconda3/bin/python", required = T)
#repl_python()
However, when writing a markdown document, you can also use Python code chunks by writing ```{python}
instead of ```{r}
.
Python code chunks work exactly like R code chunks: Python code is executed and any print or graphical output is included within the document.
Python chunks all execute within a single Python session so have access to all objects created in previous chunks. Chunk options like echo
, include
, etc. all work as expected.
This is an R Markdown document that demonstrates it:
A basic function to add strings together:
a = "Hello" + " World"
print(a)
## Hello World
A basic function to multiply strings with a number:
a = "Hello"
print(a*3 + "!")
## HelloHelloHello!
A Python for loop:
s=0
for i in range(10):
s=s+i
print(s)
## 45
We can also do these in r chunks using the function py_run_string
as well.
#Defining Variables in R
a= "Hello"
b= "World"
#Using variables defined in R in Python
py_run_string("print(r.a*3)")
py_run_string("print(r.a+r.b)")
#Using variables defined in Python in R
py_run_string("a=[1,2,3,4,5]")
a_r <- py$a
print(a_r)
## [1] 1 2 3 4 5
As you can see, the string results are the same. Also, the array that is defined in Python is again an array when it is called in R.
A Python function to write Fibonacci series up to n:
def fib(n):
"""Print a Fibonacci series up to n."""
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
print()
Let’s call the function that we created above:
fib(2000)
## 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
It can be seen that base Python can be run seamlessly in a RMarkdown document.
We can even use Python libraries with the help of the function py_run_string
:
py_run_string("import numpy as np")
py_run_string("my_python_array = np.array([2,4,6,8])")
This is a numpy array that is defined in Python.
py_run_string("print(my_python_array)")
As you can see, the array can be printed using py_run_string
again.
We can also print the array in R using the following notation:
print(py$my_python_array)
## [1] 2 4 6 8
Let’s compare the types of the objects when they are printed in R and Python:
py_run_string("print(type(my_python_array))")
class(py$my_python_array)
## [1] "array"
When it is printed in Python, the object type is numpy array and when it is printed in R, the class is regular R array.
We can use for loops for the array that we created inside py_run_string
:
py_run_string("for item in my_python_array: print(item)")
We can also import Python libraries to objects that we define in R:
np<- import('numpy')
Let’s create tow matrices:
matrix1 <- matrix(data = 1:25, nrow = 5, ncol = 5)
matrix2 <- matrix(data = 25:1, nrow = 5, ncol = 5)
Let’s say we want to use the matmul
function in the package numpy. We can do that in the following way:
matrix3 <- np$matmul(matrix1, matrix2)
class(matrix3)
## [1] "matrix" "array"
print(matrix3)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1215 940 665 390 115
## [2,] 1330 1030 730 430 130
## [3,] 1445 1120 795 470 145
## [4,] 1560 1210 860 510 160
## [5,] 1675 1300 925 550 175
We used a Python library function successfully.
Let’s try to define some variables in Python and use them in R:
mpg <- mtcars$mpg
cyl <- mtcars$cyl
We can use Python’s pandas library to manipulate the data a bit:
import pandas as pd
py_mpg = r.mpg
py_cyl = r.cyl
df=pd.DataFrame()
df["mpg"]=py_mpg
df["cyl"]=py_cyl
df_filtered=df.loc[df['mpg'] > 16]
df_filtered
## mpg cyl
## 0 21.0 6.0
## 1 21.0 6.0
## 2 22.8 4.0
## 3 21.4 6.0
## 4 18.7 8.0
## 5 18.1 6.0
## 7 24.4 4.0
## 8 22.8 4.0
## 9 19.2 6.0
## 10 17.8 6.0
## 11 16.4 8.0
## 12 17.3 8.0
## 17 32.4 4.0
## 18 30.4 4.0
## 19 33.9 4.0
## 20 21.5 4.0
## 24 19.2 8.0
## 25 27.3 4.0
## 26 26.0 4.0
## 27 30.4 4.0
## 29 19.7 6.0
## 31 21.4 4.0
Let’s see this filtered dataframe in an R code:
py$df_filtered
## mpg cyl
## 0 21.0 6
## 1 21.0 6
## 2 22.8 4
## 3 21.4 6
## 4 18.7 8
## 5 18.1 6
## 7 24.4 4
## 8 22.8 4
## 9 19.2 6
## 10 17.8 6
## 11 16.4 8
## 12 17.3 8
## 17 32.4 4
## 18 30.4 4
## 19 33.9 4
## 20 21.5 4
## 24 19.2 8
## 25 27.3 4
## 26 26.0 4
## 27 30.4 4
## 29 19.7 6
## 31 21.4 4
Let’s try to draw a graph using matplotlib:
import numpy as np
import matplotlib.pyplot as plt
import numpy.random as rng
import matplotlib.cm as cm
from matplotlib.animation import FuncAnimation
radii=(rng.random(int(1e3))+1)**2
iota=2*np.pi*rng.random(int(1e3))
x_posit=np.sqrt(radii)*np.cos(iota)
y_posit=np.sqrt(radii)*np.sin(iota)
plt.plot(x_posit, y_posit, 'go')
plt.show()