Coding Tutorial

In need to port code from a different programming language to the one with which you are more comfortable? This nifty set of libraries may be able to help you with the task at hand.

What are bridge libraries?

Bridge libraries, aka language interfaces or bindings, are specialized packages that aim to make code from another programming language accessible to the language you are using. So, if you are in a Python environment and want to use some R code or even an R library, you employ the right bridge library, and all this is made possible.

Bridge libraries are especially useful if you are new to a programming language and aren't familiar with the specialized libraries for particular tasks that you were comfortable doing in another language. Additionally, you may work in an organization that has a lot of legacy code that's reliable, but it's in another language. To leverage that code while everything is being migrated to the new programming language, you can use bridge libraries as a temporary solution. Finally, you may wish to use bridge libraries to harness specialized functions and libraries that are not available in the language you are using but are in a reliable state in other languages.

Note that you need to be somewhat familiar with data structures if you are to use bridge libraries effectively. After all, you often need to import whole classes from the other language, which involves specialized structures too. So, be sure to familiarize yourself with everything each class and library you import entails before using it via a bridge library.

Pros and Cons of bridge libraries

Pros

Bridge libraries can be a great place to get things done in a programming language that's either new to you or new overall. In such situations, you don't know what the best processes are for accomplishing the task at hand, while those processes may require know-how beyond your current level. So, by bridging to another language you are more familiar with, you solve the problem, at least for the short-term, and not get boggled down by the tools you are using.

Furthermore, a bridge library can help you become more flexible in programming and perhaps even develop more efficient ways of doing things. If you have a good command of another programming language, you can leverage that through a bridge library and fashion a hybrid script that gets the best of both worlds. This is particularly the case when you use C or Julia as the auxiliary language since such languages tend to be more efficient than most high-level languages used in data science work.

Finally, bridge libraries can be a great incentive for learning the primary programming language. Once you get over the initial hurdles of getting something done in that language and borrowing functionality from other languages, you can gradually start to explore and learn ways of doing the same thing but using only the primary language. As a bonus, by knowing the correct result from the output of the bridge script, you can enhance your learning experience and not need much external help throughout your educational journey.

Cons

Bridge libraries are not without their drawbacks, however. For instance, they aren't as efficient as they may have additional resource overhead, which may increase resource usage (this is usually an issue when the auxiliary language is not particularly fast). This issue becomes an even bigger one if you need to write something to be deployed in production, in which case you are often better off without a bridge library. Bridge libraries may be a great tool, but they are not a panacea.

Another disadvantage of bridge libraries is that they require a deeper understanding of programming and the logic of each language. In that sense, they can be a bit confusing and have a higher barrier of entry than other libraries. After all, it's unfeasible for someone to guarantee that they work perfectly for the ever-changing language they support. What if that language changes in the newest release? Can the bridge library still be trusted?

Specific bridge libraries commonly used in our field

Let's now look at some bridge libraries in specific that are popular in the high-level languages often utilized in data science projects.

Python-R

Leveraging R in Python – rpy2

Since there is often lots of legacy code in R, while most of the libraries in that language are decent and well-maintained, it makes sense that you might want to utilize this resource in your Python scripts. Fortunately, the rpy2 package has you covered. This package works with Python version 3.7 or newer and R version 4.0 or newer. To use it, you can first install it with pip or whatever package manager you are using and then import it to your environment using something like:

>>> import rpy2.rinterface as ri

First, you'd initialize it using the command:

>>> ri.initr()

Next you do whatever it is you need to do using R, e.g. using the parse() function:

>>> result = ri.parse('1 + 1')

Finally, you can shut down R using the command:

>>> ri.endr()

Beyond the rinterface class of the rpy2 library, there are three more classes geared towards more specialized applications. You can learn more about this package in the comprehensive documentation for it available at https://rpy2.github.io/doc/v3.5.x/html/overview.html

Note that this bridge library can also be used to port Python code to R. However, for this endeavor, you can also use libraries designed for this task, such as reticulate.

Leveraging Python in R – reticulate

Although this is quite a niche use case since chances are that if you are using R, you probably don't care much for the programming world (R is a scripting language, quite specialized compared to other languages used in the field). Still, if you are open to extending your expertise and venturing into Python, this can be a great place to start.

For example, you can use the Python REPL via the repl_python() function of this package as follows:

> reticulate::reply_python()

[R outputs some text related to the Python kernel that will start running]

>>> my_variable = [432, -32, 100, 10, -1]

>>> exit # get out of the Python REPL

Now, if you wish to return the REPL of Python, you can do so with the same function reticulate, and you can be sure to find any of that data you handled still there. Should you wish to access data that lives in the R environment, you can use the prefix "r." before your variable name. For example:

> my_data <- list("R", "is", "a", "nice", "niche", "scripting", "language")

> reticulate::reply_python()

>>> my_data_in_Python = r.my_data

>>> my_data_in_Python

['R', 'is', 'a', 'nice', 'niche', 'scripting', 'language']

Python-Julia

Leveraging Python in Julia – PyCall

Although the incentives for calling Python while at the Julia prompt are somewhat different from those when you are at the R prompt, it's good to be able to do that. After all, unlike some silo languages that boast high performance, Julia gets along very well with its programming buddies. So, if you have some Python libraries that you wish to summon while in your Julia environment, the PyCall package will do the trick. This library will also take care of any data structures in these libraries. You can add PyCall like you add any other package, and then use it as follows:

julia> Pkg.add("PyCall") OR (@v1.x) pkg> add PyCall

julia> using PyCall

Say that you wish to use the ubiquitous numpy package:

julia> @pyimport numpy as np # that abbreviated name bit is optional, just like in Python

julia> my_np_array = np.array([432, -32, 100, 10, -1]);

julia> println(np.median(my_np_array))

Leveraging Julia in Python – PyJulia

Sometimes when in Python, you need to do something fast. If you don't know C or are not very motivated to use it, Julia can jump in and do the heavy lifting for you. For this, the PyJulia bridge library is what you need. So, once you load it with your package manager of choice, you can utilize it the following way:

>>> from julia import SomeJuliaPackage

>>> SomeJuliaPackage.fun(input)

If you wish to get the environment where the variables live (among other things), you can run the following code:

>>> from julia import Main

You can fiddle with the variables there this way:

>>> Main.my_variable = [1234, -1234, 10]

If you want to do something useful with that variable in Julia, you can even type some custom Julia code, similarly, as long as you pass this code as a string:

>>> Main.eval("median(my_variable)")

R-Julia

Leveraging R in Julia – Rcall

Let's shift gears and look at R again, this time within Julia. Although the latter has an extensive collection of packages that do everything R has to offer, you might have some legacy code in that language that you wish to leverage in your Julia projects. If that's the case, Rcall is a valuable resource for you and one that you can use easily by following the same steps as the PyCall package. After you have it up and running, you can use it as follows:

julia> using Rcall

julia> my_variable = 123

By pressing "$" the R REPL becomes active, and you can call whatever functionality from that language you wish:

R> x <- my_variable

R> x

[1] 123

If you don't like this switcheroo business, you can evaluate some R code directly from the Julia prompt this way:

julia> R"code"

For example, if you want a few random numbers following the normal distribution, you can type:

julia> randos = R"rnorm(5)"

Robject{RealSxp}

[1] -0.4139974 -1.9438456 1.5334304 0.6654199 -0.3249515

Note that the output is going to be an Robject kind of structure. You can convert such an object to a native Julia type using the rcopy() function:

julia> rcopy(randos)

5-element Vector{Float64}:

-0.4139974

-1.9438456

1.5334304

0.6654199

-0.3249515

Leveraging Julia in R – JuliaCall

This library may be the fastest way to migrate from R to Julia like many R users did back in the day, even before Julia evolved to production-level status. The main incentive here is speed, as Julia is many times faster than R and can handle loops as fast as vectorized calculations, making it great for those R users who are more programming-oriented. Once you install it, you can utilize it the following way:

> library(JuliaCall)

> julia_setup(installJulia = TRUE)

This latter command will search for a Julia installation; failing to find anything, R will then install it. You can use this library as follows:

> julia <- julia_setup() # initialize the Julia kernel

> julia_eval("sqrt(9)")

Alternatively, you can make things more explicit:

> julia_call("sqrt", 9) # that’s particularly useful if you already have some variable in R you wish to process using Julia

> julia_assign("expression", sqrt(9)); julia_eval(expression)

If you need a particular Julia package, you can install it as necessary and then use it this way:

> julia_install_package_if_needed("JuliaPackage")

> julia_library("JuliaPackage")

Useful considerations about bridge libraries

Bridge libraries may be great, at least in theory, but when using them, you may want to ensure they are well-maintained. Sometimes, they aren't what they promise, or they may break down when the complexity of the task increases, leaving you in a difficult situation. They are great for a proof-of-concept project, though.

It goes without saying that for a bridge library to function as expected, the auxiliary language must be in an operational state on your machine. In many cases, it's best to have the location of the executable of the language in your PATH variable for it to be called properly through the bridge library.

What’s more, it's good to try to get things to work without auxiliary languages whenever possible, as this makes for more robust code. Borrowing functionality from another language may be great at times, but long term, your scripts may be more challenging to maintain. After all, by employing a bridge library you are using two distinct languages, so whoever reviews your code needs to be familiar with both.

Furthermore, it's a good idea to have ample documentation in your code, both the wrapper script and the auxiliary ones (e.g., the scripts in the other language you leverage through the bridge library). This way, if something goes wrong, it would be easier to troubleshoot the problem and come up with a solution. Even if bridge libraries can make your programs more complex, it doesn't mean they need to be unmanageable. Some good comments will take a few minutes to write, but they can save you hours in the debugging stage.

Final thoughts

So, there you have it. We covered the topic of bridge libraries, at least to some extent, and hopefully made it clear how you can leverage them in your data projects. Note that there are plenty more similar libraries, e.g., bridging C with various high-level languages. However, if we included these too, this article would be very long!

Much like other programming-related matters, the best way to learn more about this topic is to do a project around it and get your hands dirty! So, open up a Jupyter notebook, fire up your programming language of choice, and see how you can harness the code of other languages in your projects! Learning can (and ought to) be a pleasant experience.

Once you are ready to take a break from experimenting, feel free to share your experiences in the AIgents forum and connect with other professionals with a similar knack for coding. Cheers.

6 July 2022

This is a contribution from Zacharias Voulgaris.