What's all this DNA sequencing software stuff about – anyhow?

02-11-2018 | By Paul Whytock

The DNA genome that makes us humans tick is vastly complicated and just writing down the entire genetic code would fill around one million pages of very small type.

Analysing all that is no small task. Fundamentally deoxyribonucleic acid (DNA) is the way in which protein is made and proteins carry out all of life’s functions within organisms. Genetic diseases are mutated DNA that causes a change in the protein product.

The development of fast and sophisticated DNA sequencing systems has helped biological and medical research and has become an essential tool when it comes to medical diagnosis.

So all the science related stuff about DNA is fascinating but before we get onto the subject of some very smart DNA genome sequence analysis software, I got to thinking about another form of albeit totally unrelated non-biological DNA and how it impacts on our daily lives.

What I'm talking about is product design DNA. What on earth do I mean by that? Take for example the second biggest financial investment we make after the purchase of our homes – the motorcar. It could be that for a lot of motorists out there how a car evolves from its early models to its current versions has an enormous influence on whether they are going to part with their hard-earned cash.

Take the Golf GTi as a prime example of a product that has retained its early years design DNA. Launched in 1976, it handled brilliantly and was powered by an excellent fuel injected 1600cc engine. It very quickly grabbed the Number One, must-have hot-hatchback slot.

Why? Well quite simply it was a terrific car to drive. Now many people would say all it had was the right product characteristics but I say that the VW designers of the time created automotive design DNA that can still be found 40 years later in the Golf GTi's of today.

  DNA Sequencing Golf  

There are plenty of other design DNA examples. Leica cameras for example. The company was founded by Ernst Leitz back in 1914 and the name Leica is derived from the first three letters of his surname Leitz and the first two of the word camera.

  DNA Sequencing Leica  

Leica cameras were and still are exceptionally well built and use optically excellent Planar, Summarit and Summicron lenses. Today, over a century later, Leica cameras are still the product of choice for many professional and amateur snappers.

Then there are the thousands of everyday products we don't really think about, HP Sauce for example. Invented in 1895 by a grocer named Frederick Garton, he registered the name HP Sauce in 1895 and blended together a long list of ingredients into a tomato-based sauce. The ingredient-led DNA of this product has stayed the same and it still sells around 26million bottles a year.

But putting design-led product DNA aside and getting back to that biological DNA sequencing analysis, the fact is that sequencing is the process of determining the accurate order of nucleotides along chromosomes and genomes. It includes any method or technology that is used to determine the order of adenine, guanine, cytosine, and thymine in a strand of DNA. The rapid speed of analysis attained with modern DNA sequencing technology has been instrumental in the mapping of complete DNA sequences, or genomes.

So anything that speeds up sequencing analysis is a headline grabber and at a recent health event on California's west coast a software tool that does exactly that was unveiled.

Called elPrep 4.0, it is designed to speed-up complete genome and exome processing pipelines and is claimed to save a laboratory hundreds of hours of computer processing time.

Sequencing operations need to divide a human genome into thousands of fragments, which are then fed to the sequencing machines to identify the individual bases. This results in huge data files that are processed through a pipeline of tools to reconstruct the original DNA sequence from the fragments and to identify DNA variants that may indicate genetic disorders.

So what is it that makes elPrep different? Firstly its architecture allows executing pipelines by making only a single pass through the data, regardless of pipeline length.

It is designed as a multi-threaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of data of several DNA sequencing preparation steps. What this means in real terms is that elPrep is up to ten times faster than other software tools using the same resources.

It is a seamless replacement that delivers precisely the same results as GATK4.0 developed by the Broad Institute. It is written in the Go language and is available via the open-source GNU Affero General Public License V3.

 

Read more electronics news: DNA nanostructures could be used to build electronic circuits

paul-whytock.jpg

By Paul Whytock

Paul Whytock is Technology Correspondent for Electropages. He has reported extensively on the electronics industry in Europe, the United States and the Far East for over thirty years. Prior to entering journalism, he worked as a design engineer with Ford Motor Company at locations in England, Germany, Holland and Belgium.