A graphics toolkit for visualizing genomic data

There is no shortage of visualization tools in the world of genomics. But as new methods and data types emerge, existing techniques can struggle to cope. Now, a tool called Gosling allows bioscientists to create apps that can display genomic information with the same flexibility developers have come to expect from other graphics programming tools.

First published in 2020 by bioinformatician Nils Gehlenborg and his team at Harvard Medical School in Boston, Massachusetts, Gosling stands for “Grammar of Scalable Linked Interactive Nucleotide Graphics”.1. The name is also an allusion to the structural biologist Raymond Gosling, who, together with Rosalind Franklin, took the famous “Photograph 51”, which revealed the structure of DNA.

Gosling is a so-called grammar. It is implemented in programming libraries that provide flexible syntax for describing genomic regions and interactions and how they should be laid out on a web page. Researchers and bioinformaticians can use these libraries to create interactive, scalable visualizations to share with their peers and to develop customized genetic analysis tools.

The views Gosling creates can be linked, so selecting a region in one panel highlights the same region in another. They can also be panned, manipulated, and enlarged and shrunk from the chromosome level down to individual nucleotides. “The visual representation adapts to the zoom level,” says Gehlenborg – a feature called semantic zoom. An online test environment provides visualizations that users can extend to create and export their own graphics. And libraries for Python (Gos) and JavaScript (gosling.js) allow bioscientists to program the images directly into Jupyter computational notebooks and other applications. An R version in alpha stage was released in July. The libraries are used to systematically relate datasets to their visualizations, says Tamara Munzner, a computer scientist at the University of British Columbia in Vancouver, Canada. Popular libraries like ggplot2 and Vega-Lite use the “Grammar of Graphics” to define their visualizations. However, these tools can be used for any type of graphics, while Gosling is specifically designed for genomics visualizations. “It’s like Vega-Lite for genomics,” says Munzner.

Closing the gap

Visualization programming tools range from template-based functions that use a single line of code to create a standard type of chart, to those that assemble visualizations piece-by-piece from lines and geometric shapes, such as B. the JavaScript D3.js library. The template version is easy to use but relatively inflexible; The other offers a lot more customization options but is cumbersome to use.

“Gosling really bridges that gap and makes it much easier to build new tools with visualization components,” says Maria Nattestad, a software engineer at Google in Mountain View, California. As part of her PhD in 2015, Nattestad developed a tool called SplitThreader that plots the genome in a circular layout known as a circos plot, with sequenced reads as arcs to highlight structural variations. With no other options, she drew these elements from scratch, using D3.js to set the placement and dimensions of each line, rectangle, and circle. “It was such a learning curve,” she says. “It took me a long time to build SplitThreader,” she says, but adds that it probably could have built much faster with Gosling.

Gehlenborg says Gosling emerged from a 2019 literature review2, during which his team explored the landscape of genome visualization and created a taxonomy for the tools and their capabilities. From there, researchers developed a syntax to systematically describe the visualizations these tools could create. Gosling, explains Gehlenborg, “is a fundamental approach to assemble genomic visualizations using the same taxonomy”.

Gosling encodes the data in a plain text format called JavaScript Object Notation (JSON) and uses a language specific to genomics to complement the more general terms used in standard graphics libraries. Gosling.js, Gos and g(R)osling then use this encoding to generate files in their respective programming languages. The final visualization is drawn in a web browser using a rendering engine and file formatting tools developed by the Gehlenborg team to visualize chromosomal data using a technique called Hi-C3. Visualizations at gosling-lang.org provide starting points for circos plots, gene annotation, chromatin conformational heatmaps, evolutionary conservation, and more.

Postdoc Sehi L’Yi, who led Gosling’s development, says what sets Gosling apart from other visualization tools is its expressiveness. With most tools, he says, the graphics that can be created and what they look like are predefined. “It’s really not easy to customize visualizations as a user.” But with Gosling, users can specify, for example, the color, dimensions, and placement of the symbol used to represent a centromere or a genomic interval, and then map that to an ideogram of a chromosome to highlight a region of interest.

An interesting room

A team of masters students at the University of British Columbia chose Gosling to create their final project in a data visualization class. “One of my teammates heard about it at a conference last year,” says team member Armita Safa. “Even for someone who doesn’t have a programming background, Gosling is relatively easier to work with than most other things used for visualization,” she says. However, she notes that initially they struggled to extract the data they needed so users could click regions and create new visualizations.

Dominic Girardi, chief product officer at data visualization company Datavisyn in Linz, Austria, has also experimented with Gosling to create an interactive playground that allows users to filter a table of genes by genomic region. The company that Gehlenborg co-founded is now using Gosling to develop visualization tools for its corporate clients, although it hasn’t completed one yet, Girardi says.

Gosling isn’t the only visualization library for genomic data; Other examples include ggbio, gggenomes, and gggenes, which are all extensions of the ggplot2 graphics library. But most of these tools produce static images, says Gehlenborg — images rather than interactive visualizations. Gehlenborg says future plans for Gosling include giving it a graphical interface so researchers can create visualizations by dragging and dropping widgets onto a virtual canvas, rather than having to code them.

Robert Buels, who is leading the development of a genome browser at the University of California, Berkeley, says Gosling “occupies a really interesting place” in the genomics visualization toolbox. “Gosling offers a lot more customization options,” he says. But users don’t have to write nearly as much code as they do for tools like D3.js.

“It’s a really interesting niche between the two things,” he says, “which I think is a really great addition to the field.”

Latest articles

Related articles

Leave a reply

Please enter your comment!
Please enter your name here