PostgreSQL ========== .. _connecting_to_psql_command_line: Connecting to PostgreSQL databases from the command line -------------------------------------------------------- The `psql client `_ is the standard tool for connecting to PostgreSQL databases from the command line. It is pre-installed on all Faculty servers. To connect to a PostgreSQL database, you need to know: - its `hostname`: this is usually a string like ``customers.mydomain.com``; - its `port`: the default port for PostgreSQL databases is 5432; - the name of the database that you want to connect to. If this is a new server, the only database on it is likely to be named ``postgres``; - your username and password for the database server (note that this is different to your Faculty username). If you are unsure of these, you should ask your database administrator. You can then connect to the database with: .. code-block:: bash $ psql --host HOSTNAME --port PORT --user USERNAME --dbname DATABASE_NAME You should then be prompted for a password. If the port is the default 5432, you may omit the ``--port PORT`` argument. For instance, I can connect to database ``github`` running on host ``github-scraper.cpllpj7texvd.eu-west-1.rds.amazonaws.com`` with username ``awsuser``: .. thumbnail:: images/postgresql_example.png If you get stuck, run ``psql --help | less`` in a terminal to view a list of all available options. Connecting to PostgreSQL databases from Python ---------------------------------------------- To connect to PostgreSQL databases from Python, you can use the Python module `psycopg2 `_, which comes pre-installed on Faculty Jupyter and JupyterLab servers: .. code-block:: python import psycopg2 connection = psycopg2.connect( host='customers.mydomain.com', # host on which the database is running database='database_name', # name of the database to connect to user='username', # username to connect with password='password' # password associated with your username ) cursor = connection.cursor() cursor.execute('SELECT * FROM customers') customers = list(cursor.fetchall()) print('We have {} customers'.format(len(customers))) # This is data science! connection.close() .. note:: We close the connection to allow the database server to reclaim resources. This can be critical in a Jupyter notebook, since the kernel remains alive for a long time. The psycopg2 object can also be used in a ``with`` statement to ensure it gets closed automatically: .. code-block:: python import psycopg2 with psycopg2.connect( host='customers.mydomain.com', database='database_name', user='username', password='password' ) as connection: cursor = connection.cursor() cursor.execute('SELECT * FROM customers') customers = list(cursor.fetchall()) print('We have {} customers'.format(len(customers))) You can also use the ``read_sql()`` function in pandas to read the result of a query directly into a DataFrame: .. code-block:: python import psycopg2 import pandas with psycopg2.connect( host='customers.mydomain.com', database='database_name', user='username', password='password' ) as connection: df = pandas.read_sql('SELECT * FROM customers', connection) print(df) .. note:: We recommend avoiding pasting database passwords and other connection details in many notebooks in a project. Have a look at :ref:`package_connection_details` for a recommended strategy for managing database connection details. Connecting to PostgreSQL databases from R ----------------------------------------- To connect to PostgreSQL databases from an RStudio server, firstly install the dependencies for the database driver by running the following command: .. code-block:: bash $ sudo apt-get install unixodbc-dev odbc-postgresql Alternately, specify ``unixodbc-dev`` and ``odbc-postgresql`` in the System section of a custom environment, and apply this environment to your RStudio server (see :ref:`faculty_environments`). Once the ODBC drivers are installed, install the ``odbc`` library for R by running ``install.packages("odbc")`` in an R session. Run ``library(odbc)`` to import the library, and check which drivers are installed on the system by running ``odbc::odbcListDrivers()``. To connect to a PostgreSQL database from R create a connection object: .. code-block:: R connection <- DBI::dbConnect( odbc::odbc(), Driver = "psqlodbca.so", Server = "customer.mydomain.com", # Host on which the database is running Database = "database_name", # Name of the database UID = "username", # Database username PWD = "password", # Database password Port = 5432 # Database port (5432 is the PosgreSQL default) ) Once you have a connection object, you can list the tables in the database with: .. code-block:: R dbListTables(connection) You can then query a table with ``dbSendQuery`` and ``dbFetch``: .. code-block:: R result <- dbSendQuery(connection, "SELECT * FROM customers") dbFetch(result) where ``dbSendQuery`` sends the query and ``dbFetch`` retrieves the results. For other kind of queries, just use ``dbSendQuery``.