Creating Custom Functions In PostgreSQL

RMAG news

In PostgreSQL, custom functions can be created to solve complex problems.

These can be written using the default PL/pgSQL scripting language, or they can be written in another scripting language.

Python, Perl, Tcl and R are some of the scripting languages supported.

While PL/pgSQL comes with any Postgres installation, to use other languages requires some setup.

Installing the extension

Before an extension can be used, the extension package needs to be installed.

On Ubuntu you would run:

Perl

sudo apt-get -y install postgresql-plperl-14

The package name ‘postgresql-plperl-14’ is specific to PostgreSQL version 14. If you’re using a different version of PostgreSQL, you need to change the version number in the package name to match your installed PostgreSQL version.

Python 3

sudo apt-get install postgresql-plpython3-14

Activating the extension

To activate the extension in PostgreSQL the extension must be defined using the CREATE EXTENSION statement.

Perl

CREATE EXTENSION plperl;

Python

CREATE EXTENSION plpython3;

Hello world example

Once the extension has been created, a custom function can be created using the extension.

Perl

CREATE OR REPLACE FUNCTION hello(name text)
RETURNS text AS $$
my ($name) = @_;
return “Hello, $name!”;
$$ LANGUAGE plperl;

Python

CREATE OR REPLACE FUNCTION hello(name text)
RETURNS text AS $$
return “Hello, “ + name + “!”
$$ LANGUAGE plpython3;

Breaking this down line by line

CREATE OR REPLACE FUNCTION hello(name text)

This line is how a function is created in Postgres. By using CREATE OR REPLACE, it will overwrite whatever function is already defined with the name hello with the new function.

Using CREATE FUNCTION hello(name text) will prevent the function from overwriting an existing function and will error if the function already exists.

RETURNS text AS $$

This defines what Postgres data type will be returned, it’s important that the data type specified is a type recognized by Postgres. A custom data type can be specified, if the custom type is already defined.

$$ is a delimiter to mark the beginning and end of a block of code. In this line it’s marking the start of the code block.

All code between the start and end $$ will be executed by Postgres

$$ LANGUAGE plperl;

$$ denotes the end of the script and tells Postgres what language the script should be parsed as.

Using the function

Functions can be used like any built-in Postgres function

SELECT hello(‘world’);

This will return a column with the value Hello world!

Functions can be part of more complex queries:

SELECT id, title, hello(‘world’) greeting FROM table;

More complex example

Here is an example function that accepts text from a field and returns a word count.

CREATE OR REPLACE FUNCTION word_count(paragraph text)
RETURNS json AS $$
use strict;
use warnings;

my ($text) = @_;

my @words = $text =~ /w+/g;
my $word_count = scalar @words;

my $result = ‘{‘ .
‘”word_count”:’ . $word_count .
‘}’;
return $result;
$$ LANGUAGE plperl;

This returns a JSON formatted result with the word count.

We can add more detailed statistics to the function.

CREATE OR REPLACE FUNCTION word_count(paragraph text)
RETURNS json AS $$
use strict;
use warnings;

my ($text) = @_;

my @words = $text =~ /w+/g;

my $word_count = scalar @words;

my $sentence_count = ( $text =~ tr/!?./!?./ ) || 0;

my $average_words_per_sentence =
$sentence_count > 0 ? $word_count / $sentence_count : 0;

my $result = ‘{‘ .
‘”word_count”:’ . $word_count . ‘,’ .
‘”sentence_count”:’ . $sentence_count . ‘,’ .
‘”average_words_per_sentence”:”‘ . sprintf(“%.2f”, $average_words_per_sentence) . ‘”‘ .
‘}’;

return $result;
$$ LANGUAGE plperl SECURITY DEFINER;

Now when we use it in a query

SELECT word_count(text_field) word_count FROM table

It will return JSON like

{“word_count”:116,“sentence_count”:15,“average_words_per_sentence”:“7.73”}

Security considerations

When using custom functions or external scripting languages, there are additional security considerations to take into account. It can be a juggling act to get the right balance between usability and security.

Security Definer vs Security Invoker

In the previous function, SECURITY DEFINER option was added to the create function statement.

It’s important to think about how you want a function run from a security point of view.

The default behavior is to use SECURITY INVOKER. This will run the function with the privileges of the user who is running the function.

SECURITY DEFINER provides more control over the privileges granted to the function. Using this mode, the function will run with the privileges of the user who created the function.

This can be both good and bad, if a function is created by a user with limited privileges, then there is little harm that can be done to the database.

If the function is created by a user with high access privileges, then the function will run with those same privileges. Depending on the type of function, this could allow a user to run the function with more open privileges than they have been granted.

There are times where this is useful, for example, if a user does not have read privileges to a table, but within the function , read is required, using SECURITY DEFINER can allow the required read privileges for the function to run.

Trusted and untrusted extensions

When creating the extensions above, plperl and plpython3 were used. In most circumstances these are the right extensions to use.

These extensions have limited access to the servers file system and system calls.

Extensions can also be created with a u (plpython3u, plperlu)

These are untrusted extensions and allow more access to the servers file system.

There may be cases where this is required, for example, if you want to use Perl modules, Python Libraries, or use system calls.

In the example above, the JSON output was generated as a string, if desired, the perl JSON module could have been used to encode the data as JSON. To do this would require using the untrusted extension to access the JSON module.

It’s advisable to not use the untrusted extensions, but if necessary, use with caution and understand the potential risks.

If Perl is being used, Perl will run in taint mode when the untrusted extension is in use.

Final Thoughts

Being able to take advantage of Perls advanced text processing and memory management, or Pythons data analytic libraries within PostgreSQL can be a really powerful tool.

Passing off complex tasks to tools more suited to handling the task can reduce overhead on the database.

As always, when using custom functions and external scripting languages, take precautions to ensure secure usage.

Please follow and like us:
Pin Share