A. Preface and MySQL configuration

The UCSC Genome Browser is one of the most essential tools in genomics research. Its value is ever increasing, proportionally to the current explode in available Next Generation Sequencing data. Its installation is not something mainstream and requires a lot of patience and a little more than basic knowledge of Linux environment and MySQL. Before you try it, make sure that you know how to install linux packages (and also from source), how to perform a basic MySQL and Apache setup and how to run Perl and Shell scripts. This guide is not exaclty a step by step procedure as it refers a lot of times to external sources, blogs and wikis found around the web. Based on the work of others, I tried to install an as customizable as possible version on my server, to be used by several labs at the institution I am currently working in.

My installation is performed on an Ubuntu 12.04 LTS Server. You can adjust it for your distribution. Throughout this guide we assume that our base storage environment is /media/HD2/, so you will see a lot of time the shell variable $STORAGE=”/media/HD2″. We also assume a temporary directory, $TEMP, by default the /tmp directory

If you have MySQL > 5.5 (which is the default in Ubuntu>=12.04) you must recompile from the source in order to enable the

load data local infile

MySQL command, which is by default disabled in more recent versions. To this end, you can follow the instructions here. In the cmake command, add:


Then, edit the /etc/mysql/my.cnf file and comment all the ssl functions under [mysqld] and change

lc-messages-dir = /usr/share/mysql


lc-messages-dir = /usr/local/mysql/share

While in my.cnf, add the following lines under [mysqld]

key_buffer              = 1024M
max_allowed_packet      = 64M
thread_stack            = 512K
thread_cache_size       = 32
table_cache             = 1024
query_cache_limit       = 16M
query_cache_size        = 1024M
sort_buffer_size        = 16M
read_buffer_size        = 16M
read_rnd_buffer_size    = 32M
myisam_sort_buffer_size = 512M
bulk_insert_buffer_size = 1024M
join_buffer_size        = 512M
innodb_flush_log_at_trx_commit  = 2
innodb_log_buffer_size  = 64M
innodb_log_file_size    = 512M
innodb_buffer_pool_size = 32768M # Watch this as I have 64GB of RAM!
innodb_thread_concurrency = 16

the following under [myisamchk]

key_buffer              = 1024M
sort_buffer_size        = 512M
read_buffer             = 64M
write_buffer            = 64M

and the following under [isamchk]

key_buffer              = 1024M
sort_buffer_size        = 512M
read_buffer             = 64M
write_buffer            = 64M

Before starting the newly compiled mysql server, follow EXACTLY the instructions here┬áand in a more friendly version here┬áto properly reconfigure InnoDB to store and process bigger bulk imports, as it is mandatory for the faster function and import of custom tracks. You should also lock the MySQL version you just installed in order not be affected by Ubuntu’s updating system. This is quite easy and you can do it by a little googling.

B. Installation of the UCSC Genome Browser web application and session system

This section contains instructions on how to install the Genome Browser application only. The visualization application, the session system and the other UCSC applications (e.g. the Table Browser) are independent of the background databases containing several genomic features. This section assumes basic knowledge about Apache, installing packages from source and basic MySQL administration knowledge.

  1. Create $STORAGE/gbdb and $STORAGE/genomebrowser directories
    sudo mkdir $STORAGE/gbdb
    sudo mkdir $STORAGE/genomebrowser
  2. Fetch the kent source tree to a $STORAGE/kent directory
    sudo mkdir $STORAGE/kent
    sudo git clone git://genome-source.cse.ucsc.edu/kent.git
  3. Copy $STORAGE/kent/src/product/scripts to $STORAGE/scripts
    sudo mkdir $STORAGE/scripts
    sudo cp -r $STORAGE/kent/src/product/scripts $STORAGE/scripts
  4. Open synaptic package manager and install the libmysqlclient-dev packages, and generally other libmysql development packages to get header files. At this point you should be careful not to interfere with the new MySQL installation of section A. It might require some time playing around, but generally, it should work from the first effort. If not, perform this step before installing MySQL from source, in section A.
  5. Optionally, enable SSL in MySQL, see here for detailed instructions. Fix the apparmor by following instructions here.
  6. Install SAM tools from source from here or look for the samtools package in Ubuntu synaptic application.
  7. Edit both your account .bashrc file as well as the /root/.bashrc and add the line
    export MACHTYPE=x86_64

    (replace x86_64 with your machine’s architecture, it can be found with uname -p) and reload them.

    source ~.bashrc
    su # ...and then your password
    source ~.bashrc
  8. Edit $STORAGE/scripts/browserEnvironment.txt. Change the following to (changed below):
    export KENTHOME=$STORAGE"/kenthome/"
    export kentSrc=$STORAGE"/kent"
    export GBDB=$STORAGE"/gbdb"
    export BROWSERHOME=$STORAGE"/genomebrowser"
    #export MYSQLLIBS="/usr/lib/x86_64-linux-gnu/libmysqlclient.a -lz"
    export MYSQLLIBS="/usr/local/mysql/lib/libmysqlclient.a -lz"
    export MYSQLINC="/usr/local/mysql/include"
    export PNGLIB="/usr/lib/x86_64-linux-gnu/libpng.a"
    export PNGINCL"-I/usr/include/libpng12"
    export USE_BAM=1 (uncomment)
    export KNETFILE_HOOKS=1 (uncomment)
    export SAMDIR=/opt/NGSTools/SAMTools
    export SAMINC=${SAMDIR} (uncomment)
    export SAMLIB=${SAMDIR}/libbam.a (uncomment)
    export AUTH_MACHINE="sevenofnine"
    export AUTH_USER="root"
  9. Prepare ApacheEnable XBitHack
    sudo a2enmod include

    In /etc/apache2/apache2.conf add the line

    XBitHack on

    Create a virtual host file called my_prefered_host_name in /etc/apache2/sites-available and copy:

    XBitHack on
    # Virtual host for genomebrowser
    <VirtualHost *:80>
    	ServerAdmin your_admin_mail@yourdomain.com
    	DocumentRoot $STORAGE/genomebrowser
    	ServerName genomebrowser
    	<Directory />
    		Order deny,allow
    		Deny from all
    		Options FollowSymLinks
    		AllowOverride None
    	<Directory $STORAGE/genomebrowser>
    		AllowOverride AuthConfig
    		Options +Inlcudes
    		Order allow,deny
    		allow from all
    	ScriptAlias /cgi-bin/ $STORAGE/genomebrowser/cgi-bin/
    	<Directory "$STORAGE/genomebrowser/cgi-bin">
    		AllowOverride None
    		Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
    		Order allow,deny
    		Allow from all
    		AddHandler cgi-script .cgi .pl
    	ErrorLog $STORAGE/genomebrowser/logs/apache2/error.log
    	CustomLog $STORAGE/genomebrowser/logs/apache2/access.log combined
    	LogLevel warn
    	Alias /doc/ "/usr/share/doc/"
    	<Directory "/usr/share/doc/">
    		Options Indexes MultiViews FollowSymLinks
    		AllowOverride None
    		Order deny,allow
    		Deny from all
    		Allow from ::1/128
    	# Some security
    	ServerSignature Off

    Add the following line to /etc/hosts     genomebrowser

    Restart the networking service

    sudo /etc/init.d/networking restart

    Restart Apache

    sudo /etc/init.d/apache2/restart
  10. Create a MySQL user using either MySQL command line or webmin or phpMyAdmin (I created gbuser, password MY_PASSWORD, using  webadmin). You should have these tools anyway as they are very handy for managing your system.
  11. Create the hg.conf file as described here, in Part 1: Genome Browser engine. Here is mine:
    # Configuration file for the UCSC Human Genome server
    # the format is in the form of name/value pairs, written as 'name=value'
    # note that there is no space between the name and its value. Also, no blank lines should be in this file.
    # db.host is the name of the MySQL host to connect to
    # db.user is the username used when connecting to the host
    # this is the password to use with the above hostname
    # central.host is the name of the host of the central MySQL
    # database where stuff common to all versions of the genome
    # and the user database is stored.
    # required to use hgLogin
    login.systemName=hgLogin CGI
    # url to server hosting hgLogin
    # name of cookie holding username - do not change!
    # name of cookie holding user id - do not change!
    # title of host of browser, this text be shown in the user interface of the login/sign up screens
    login.browserName=UCSC Genome Browser @Fleming
    # base url of browser install
    # signature written at the bottom of hgLogin system emails
    login.mailSignature=Local administrator: Panagiotis Moulos
    # from/return email address used for system emails

    The last lines (about login) will enable the independent login system of the browser so as to be able to host different users.

  12. Create a /root/bin/x86_64 directory and $STORAGE/kenthome/bin/x86_64 directory and a symbolic link
    sudo mkdir -p /root/bin/x86_64
    sudo mkdir -p $STORAGE/kenthome/bin/x86_64
    sudo ln -s $STORAGE/kenthome/bin/x86_64 /root/bin/x86_64
  13. Create the /gbdb symlink. Very important…
    sudo ln -s $STORAGE/media/HD2/gbdb /gbdb
  14. Before fetching the html files using updateHtml.sh, I edited the updateHtml.sh kent script in $STORAGE/scripts to also displaty the rsync output in stdout instead of log only. To do this, go to the ${RSYNC} commands towards the end of the script and replace
    >> ${FETCHLOG} 2>&1


    | tee -a ${FETCHLOG} 2>&1.

    Save the file and then run it. Add also –verbose after ${RSYNC}.

    sudo sh updateHtml.sh ./browserEnvironment.txt
  15. Before fetching and compiling the source, we have to patch SAMTools to enable network support for BAM files. This has to be done manually, as the SAMTools do not yet support it. The patch as well as full instruction on how to apply it can be found here. Please follow them carefully
  16. Now we have to run kentSrcUpdate.sh in order to fetch the latest code and build the binaries and CGIs from source. Open the kentSrcUpdate.sh script. Towards the end, replace the > daily.log etc. of the make commands with | tee -a (see also step 13) to display all messages in STDOUT. Then, run the script
    sudo sh kentSrcUpdate.sh ./browserEnvironment.txt
  17. We must download now the hgcentral database. We use the fetchHgCentral.sh script for that
    sudo sh fetchHgCentral.sh go > $TEMP/hgcentral.sql
  18. We must set up an SQL database to accept the file that we just downloaded, along with a genome browser user. Ideally, we should have a user with SELECT permissions and a user with ALL permissions… We set a user will ALL permissions for now as the browser itself is graphical and does not allow for writing
    -e "CREATE USER 'gbuser'@'localhost' identified by 'password'; FLUSH PRIVILEGES;"
    TO 'gbuser'@'localhost'; FLUSH PRIVILEGES;"
    -e "GRANT FILE ON *.* TO 'gbuser'@'localhost'; FLUSH PRIVILEGES;"
    -e "GRANT SELECT ON hgFixed.* TO 'gbuser'@'localhost'; FLUSH PRIVILEGES;"
  19. Import the hgcentral database
    mysql -ugbuser -ppassword hgcentral < $TEMP/hgcentral.sql

    The basic genome browser session functionality should be almost ready. We need to create a couple more symbolic links to custom JavaScript and CSS files

  20. Create the following symbolic links:
    sudo ln -s $STORAGE/genomebrowser/cgi-bin $STORAGE/genomebrowser/htdocs/cgi-bin
    sudo ln -s $STORAGE/genomebrowser/trash $STORAGE/genomebrowser/htdocs/trash
  21. Create the /usr/local/apache/htdocs directory (nothing there) and then the following symbolic links:
    sudo mkdir -p /usr/local/apache/htdocs
    sudo ln -s $STORAGE/genomebrowser/htdocs/js /usr/local/apache/htdocs/js
    sudo ln -s $STORAGE/genomebrowser/htdocs/style /usr/local/apache/htdocs/style
    sudo ln -s $STORAGE/genomebrowser/htdocs/inc /usr/local/apache/htdocs/inc
    sudo ln -s $STORAGE/genomebrowser/htdocs/images /usr/local/apache/htdocs/images
    sudo ln -s $STORAGE/genomebrowser/htdocs/goldenPath/help /usr/local/apache/htdocs/goldenPath/help/

    At this point the website must be partially functional. Now we have to install some genome databases

  22. Change the ownership of the contents of the genomebrowser directory to www-data and restart apache
    sudo chown -R www-data:www-data $STORAGE/genomebrowser
    sudo /etc/init.d/apache2 restart

C. Installation of minimal genome databases

  1. Create a file named my.minimal.db.list.txr and type the following (for 5 organisms):
  2. Fetch the minimal gbdb information for these organisms by running the script fetchMinimalGbdb.sh. Before running, edit and replace to the last lines, where fetchOne is called, > with | tee -a to display information as before. Add also –verbose option in the ${RSYNC} commands, if additional information is essential to you (it was for me!).
    sudo sh fetchMinimalGbdb.sh ./browserEnvironment.txt ./my.minimal.db.list.txt
  3. Fetch the minimal golden path database information for these organisms by running the script  fetchMinimalGoldenPath.sh. Before running, edit and replace lines for more verbosity, as in step 2.
    sudo sh fetchMinimalGoldenPath.sh ./browserEnvironment.txt ./my.minimal.db.list.txt
  4. hg18 sql table creation files have a syntax problem (at least with my MySQL version). Go to $STORAGE/genomebrowser/htdocs/goldenPath/hg18/database and run
    sudo sed -i 's/TYPE=/ENGINE=/g' *.sql
  5. Load the minimal golden path databases fetched with the script above
    sudo sh loadDb.sh ./browserEnvironment.txt hg18
    sudo sh loadDb.sh ./browserEnvironment.txt hg18
    sudo sh loadDb.sh ./browserEnvironment.txt hg19
    sudo sh loadDb.sh ./browserEnvironment.txt mm9
    sudo sh loadDb.sh ./browserEnvironment.txt mm10
    sudo sh loadDb.sh ./browserEnvironment.txt dm3
    sudo sh loadDb.sh ./browserEnvironment.txt hgFixed
  6. Grant access to the newhe genome browser user
    for DB in hg18 hg19 mm9 mm10 dm3
  7. Now you must have a basic track functionality if the local version of UCSC genome browser. However, there are not many things that can be done, apart from custom track exploration and sequence retrieval as there is no gene annotations etc. The next section explains how we can customize the UCSC databases a bit further than the “take the minimum or all” approach of the kent scripts.

D. Installation of other genome database tables

  1. As it is very space costly (and most times useless) to install the full mirror of Genome Browser databases and there is no straightforward way to determine what feature corresponds to which table, I created a Perl script called fetchCustomDb.pl┬áto fetch the tables we need. However, this is not completely automatic as it requires certain manual work to determine the tables for the required features (I did it using the UCSC Table Browser) and to note them down so as to create a YAML configuration file which is required by the Perl script. The YAML configuration file is quite self-excplicable and contains these tables but in a configuration format understandable by the Perl script together with other variables. The script can be downloaded from here and the YAML parameter file from here. I created the table list by fetching the tables for the features of interest in the UCSC Table Browser and then by viewing the source of the page and copying-pasting the contents of ther respective SELECT list. This of course can become more systematic by using some package to scrap the page (in the TODO list…). In this way, the final table list contains a lot of duplicates as many tables are interconnected. But the script takes care of that. As the tables and the way they are constructed across genomes are a mess (e.g. in some genomes features are splitted per chromosome, in others not), you need to explore a bit the FTP server of UCSC to determine that. Another script also takes care of the external databases that have to be installed (e.g. GO, UniProt, etc.) by directly using mysqldump in UCSC server. It can be downloaded from here. Once done, you can pass these tables to the parameters file and the script takes care of the rest. After you define all these, you just run
    sudo perl fetchCustomDb.pl --param your_param_file.yml

    The parameter file is optional if your needs are the same as mine (they are loaded also by default). It is advised to use the –dry parameter to see what will be the total amount of data to be downloaded, as UCSC data continue to expand.

    sudo perl fetchCustomDb.pl --param your_param_file.yml --dry

    The script uses a Perl interface for rsync. One would wonder why not use a simple shell script with multiple rsync lines. The answer for me is easy usage, reusability, elegance, system and maintenance!

  2. Now we must reload the database tables. This will be done with the kent script loadDb.sh. However, keep in mind that in order to use this script, the databases created in step C.5 must be dropped as this script works only if the databases do not exist. This can be easily done using a GUI tool such as phpMyAdmin or webmin and sufficient user privileges, or even in command line by
    -e "DROP DATABASE genome_to_be_dropped;"

    Do NOT drop the hgcentral database! The latest version of the aforementioned Perl script takes care of that for you. Just read the documentation.

  3. Set up a cron work in /etc/cron.weekly to clean the trash data from custom tracks. You can do this either in webmin or using the script here, name it clean-gb-trash and set permissions to 755:
    find $STORAGE/genomebrowser/trash/ \! \( -regex "$STORAGE/genomebrowser/trash/ct/.*" \
     -or -regex "$STORAGE/genomebrowser/trash/hgSs/.*" \) -type f -amin +10080 -exec rm -f {} \;
    find $STORAGE/genomebrowser/trash/    \( -regex "$STORAGE/genomebrowser/trash/ct/.*" \
     -or -regex "$STORAGE/genomebrowser/trash/hgSs/.*" \) -type f -amin +20160 -exec rm -f {} \;


    sudo chmod 755 /etc/cron.daily/clean-gb-trash
  4. Finally, download the processed Genbank files from UCSC FTP path /gbdb/genbank/./data/processed/* to $STORAGE/gbdb/genbank/./data/processed/
    sudo rsync --archive --compress --partial --recursive --progress --stats --verbose --human-readable \
    rsync://hgdownload.cse.ucsc.edu/gbdb/genbank/data/processed/* \

E. Setting up the custom track database

This section describes how to set up support for the custom track database, so that to avoid using the trash directories and achieve faster access. This was later added to the UCSC Genome Browser. It is advised that you follow this step as it is recommended for proper user sessions.
  1. Enable the custom track database. This is not handled by kent scripts. To do this, firstly create the customTrack database in MySQL
  2. Create another user to work with custom tracks, e.g. ctgbuser (I created ctgbuser with password ctbguser@mydomain).
    -e "CREATE USER 'ctgbuser'@'localhost' IDENTIFIED BY 'password'; FLUSH PRIVILEGES;"
    TO 'ctgbuser'@'localhost'; FLUSH PRIVILEGES;" 
    -e "GRANT FILE ON *.* TO 'ctgbuser'@'localhost'; FLUSH PRIVILEGES;"
  3. Create a temporary directory which is used by this functionality
    sudo mkdir $STORAGE/genomebrowser/data/tmp
    sudo chown -R www-data:www-data $STORAGE/genomebrowser/data/tmp
  4. Enter the following items to hg.conf
  5. Make sure you do all things below as root, either with su or with sudo. Create a hidden directory .conf in $STORAGE/genomebrowser.
    sudo mkdir $STORAGE/genomebrowser/.conf

    In this directory, create the .ct.hg.conf file file with the following contents:


    and set its permissions to 600

    sudo chmod 600 ct.hg.conf

    Next, place a copy of hg.conf there too

    sudo cp $STORAGE/genomebrowser/cgi-bin/hg.conf  $STORAGE/genomebrowser/.conf/.hg.conf

    Finally, create two symbolic links in /root for these files

    sudo ln -s $STORAGE/genomebrowser/.conf/.hg.conf .hg.conf
    sudo ln -s $STORAGE/genomebrowser/.conf/.ct.hg.conf .ct.hg.conf
  6. In /etc/cron.daily create the tmp cleaner script (cleans it daily) and name it clean-gb-tmp:
    find $STORAGE/data/tmp -type f -amin +1440 -exec rm -f {} \;

    and then

    sudo chmod 755 /etc/cron.daily/clean-gb-tmp
  7. Create the following script to be used with a cron job (better schedule it through webmin tool) to periodically clean the custom tracks database and name it clean-gb-ctdb
    DS=`date "+%Y-%m-%d"`
    YYYY=`date "+%Y"`
    MM=`date "+%m"`
    export DS YYYY MM
    mkdir -p $STORAGE/genomebrowser/data/trashLog/localhost/${YYYY}/${MM}
    export RESULT
    sudo $STORAGE/kenthome/bin/x86_64/dbTrash -age=168 -drop -verbose=2 \
    > ${RESULT} 2>&1

    and then

    sudo chmod 755 /etc/cron.daily/clean-gb-ctdb

    This will clean it weekly and keep a log.┬áDon’t forget to add┬á$STORAGE/kenthome/bin/x86_64/dbTrash to your sudoers file, so as not to ask for password confirmation. You can google on how to do this, it’s easy. The reason for this is that dbTrash uses .hg.conf which is located under┬á/root┬áhome and the default cron user (which is the root by the way, strange…) cannot find it.

F. Setting up the Blat server (optional but required for most molecular biology labs)

  1. Most tools required by blat have already been compiled (gfServer, gfClient, faToNib and blat). If some of them have not been compiled in step B.16, compile them separately (see also the first note in Notes).
  2. Update the blatServers table in the hgcentral database with the address of your host (usually localhost).
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD -e "USE hgcentral; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%hg18%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%hg19%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%mm9%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%mm10%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%dm3%';"

    It would be good to backup this table first or note down the initial entries in case something goes wrong. You can also change the default ports (explore the relative tables using MySQL command line or phpMyAdmin).

  3. Everything else is straightforward. See also instructions in here on how to launch the blat server. I created a startup script and made it run every time my machine boots (with a certain delay unfortunately). It also contains a lot of hardcoded paths, which is in the todo list to change.
    sudo update-rc.d sblat defaults
    sudo chmod +x /etc/init.d/sblat


  • I faced a lot of problems with making the whole Kent source tree. For example, one tool needed for the cleaning of the customTrash database, dbTrash, was not compiling… There was a problem with mysql and sql libraries, so what I did was to install everything that had to do with dev packages from synaptic and in the /src directory of /kent source, I typed make libs. I found here. The tool was not compiled at first but when I visited /src/hg/dbTrash and typed make, it was compiled. All of the above as root. With the same way, I compiled the hgsql tool.
  • It is recommended that you migrate the MySQL database storage folder from the default location, as the table sizes will explode fast, especially if you want to host a lot of features, so as to keep the filesystem light. The process is not very difficult and explained in many blogs/forums. Just google for it.
  • Be sure also to have a lot of available space for the gbdb directory, as all the big tracks (not suitable for a database, e.g. ENCODE tracks and genome files) are stored there.
  • To insert a new table in any Genome Browser database without loading everything from the beginning:
    mysql -uSER_WITH_WRITE_PERMISSIONS -pPASSWORD --local-infile -e \
    "load data local infile '/path/to/my/table.ext' into table my_genome.table;"

    The table must have been created first from the respective table.sql file! I will soon provide a couple of example scripts as until now there have been a lot of times that I should add extra tables (according to the needs of my collegues) without rebuilding everything from the beginning.

  • There is a general problem with the $MACHTYPE environmental variable outside the kent shell scripts that are used for the general Genome Browser building (/src/procuct scripts). I fixed this by manually editing the makefile of each additional tool I wanted to use (e.g. the spToDb tool) and replaced the line
    MYLIBDIR = ../../../lib/$(MACHTYPE)


    MYLIBDIR = ../../../lib/x86_64


  • Log rotation scripts for the logs maintained by most of the UCSC Genome Browser tools… If someone has done it, I would really appreciate some sharing!
  • Better scrap table browser pages for table names?