Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
English to Chinese: DeepVariant Accuracy Improvements for Genetic Datatypes General field: Marketing Detailed field: IT (Information Technology)
Source text - English Last December we released DeepVariant, a deep learning model that has been trained to analyze genetic sequences and accurately identify the differences, known as variants, that make us all unique. Our initial post focused on how DeepVariant approaches “variant calling” as an image classification problem, and is able to achieve greater accuracy than previous methods.
Today we are pleased to announce the launch of DeepVariant v0.6, which includes some major accuracy improvements. In this post we describe how we train DeepVariant, and how we were able to improve DeepVariant's accuracy for two common sequencing scenarios, whole exome sequencing and polymerase chain reaction sequencing, simply by adding representative data into DeepVariant's training process.
Many Types of Sequencing Data
Approaches to genomic sequencing vary depending on the type of DNA sample (e.g., from blood or saliva), how the DNA was processed (e.g., amplification techniques), which technology was used to sequence the data (e.g., instruments can vary even within the same manufacturer) and what section or how much of the genome was sequenced. These differences result in a very large number of sequencing "datatypes".
Typically, variant calling tools have been tuned for one specific datatype and perform relatively poorly on others. Given the extensive time and expertise involved in tuning variant callers for new datatypes, it seemed infeasible to customize each tool for every one. In contrast, with DeepVariant we are able to improve accuracy for new datatypes simply by including representative data in the training process, without negatively impacting overall performance.
Truth Sets for Variant Calling
Deep learning models depend on having high quality data for training and evaluation. In the field of genomics, the Genome in a Bottle (GIAB) consortium, which is hosted by the National Institute of Standards and Technology (NIST), produces human genomes for use in technology development, evaluation, and optimization. The benefit of working with GIAB benchmarking genomes is that their true sequence is known (at least to the extent currently possible). To achieve this, GIAB takes a single person's DNA and repeatedly sequences it using a wide variety of laboratory methods and sequencing technologies (i.e. many datatypes) and analyzes the resulting data using many different variant calling tools. A tremendous amount of work then follows to evaluate and adjudicate discrepancies to produce a high-confidence "truth set" for each genome.
The majority of DeepVariant’s training data is from the first benchmarking genome released by GIAB, HG001. The sample, from a woman of northern European ancestry, was made available as part of the International HapMap Project, the first large-scale effort to identify common patterns of human genetic variation. Because DNA from HG001 is commercially available and so well characterized, it is often the first sample used to test new sequencing technologies and variant calling tools. By using many replicates and different datatypes of HG001, we can generate millions of training examples which helps DeepVariant learn to accurately classify many datatypes, and even generalize to datatypes it has never seen before.
Improved Exome Model in v0.5
In the v0.5 release we formalized a benchmarking-compatible training strategy to withhold from training a complete sample, HG002, as well as any data from chromosome 20. HG002, the second benchmarking genome released by GIAB, is from a male of Ashkenazi Jewish ancestry. Testing on this sample, which differs in both sex and ethnicity from HG001, helps to ensure that DeepVariant is performing well for diverse populations. Additionally reserving chromosome 20 for testing guarantees that we can evaluate DeepVariant's accuracy for any datatype that has truth data available.
In v0.5 we also focused on exome data, which is the subset of the genome that directly codes for proteins. The exome is only ~1% of the whole human genome, so whole exome sequencing (WES) costs less than whole genome sequencing (WGS). The exome also harbors many variants of clinical significance which makes it useful for both researchers and clinicians. To increase exome accuracy we added a variety of WES datatypes, provided by DNAnexus, to DeepVariant's training data. The v0.5 WES model shows 43% fewer indel (insertion-deletion) errors and a 22% reduction in single nucleotide polymorphism (SNP) errors.
The total number of exome errors for HG002 across DeepVariant versions, broken down by indel errors (left) and SNP errors (right). Errors are either false positive (FP), colored yellow, or false negative (FN), colored blue. The largest accuracy jump is between v0.4 and v0.5, largely attributable to a reduction in indel FPs.
Improved Whole Genome Sequencing Model for PCR+ data in v0.6
Our newest release of DeepVariant, v0.6, focuses on improved accuracy for data that has undergone DNA amplification via polymerase chain reaction (PCR) prior to sequencing. PCR is an easy and inexpensive way to amplify very small quantities of DNA, and once sequenced results in what is known as PCR positive (PCR+) sequencing data. It is well known, however, that PCR can be prone to bias and errors, and non-PCR-based (or PCR-free) DNA preparation methods are increasingly common. DeepVariant's training data prior to the v0.6 release was exclusively PCR-free data, and PCR+ was one of the few datatypes for which DeepVariant had underperformed in external evaluations. By adding PCR+ examples to DeepVariant's training data, also provided by DNAnexus, we have seen significant accuracy improvements for this datatype, including a 60% reduction in indel errors.
DeepVariant v0.6 shows major accuracy improvements for PCR+ data, largely attributable to a reduction in indel errors. Here we re-analyze two PCR+ samples that were used in external evaluations, including DNAnexus on the left (see details in figure 10) and bcbio on the right, showing how indel accuracy improves with each DeepVariant version.
Independent evaluations of DeepVariant v0.6 from both DNAnexus and bcbio are also available. Their analyses support our findings of improved indel accuracy, and also include comparisons to other variant calling tools.
We released DeepVariant as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. As the pace of innovation in sequencing technologies continues to grow, including more clinical applications, we are optimistic that DeepVariant can be further extended to produce consistent and highly accurate results. We hope that researchers will use DeepVariant v0.6 to accelerate discoveries, and if there is a sequencing datatype that you would like to see us prioritize, please let us know.
Translation - Chinese 去年 12 月，我们发布了 DeepVariant，这是一种深度学习模型，经训练后可以分析基因序列并准确识别差异（称为变异，是区分不同人的关键因素）。我们最初的文章集中讨论了 DeepVariant 如何从图像分类角度解决“变异识别”问题并且比以往的方法更精确。
Nice to e-meet you here. I am a professional English to Chinese translator native in Simplified Chinese.
I have 8 years+ localization experiences as a full-time translator and as a multi-language project manager for 2 years previously at EC Innovations, while now I am a full-time freelance translator. With rich experiences in localization & translation, I deeply understand localization requirement, standard, significance of professionalism, efficiency & on-time delivery. I can guarantee all of my works are well amended, finalized and submitted with quality.
My translation services stretch across IT, Marketing (Tourism, Hotel, Real Estate, Advertising), Business and Medical/Life Science.
Now I work for Workday as a formal English > Chinese contract translator and the regular translator for Google, LinkedIn, Facebook, Johnson & Johnson, Dell/EMC.
• Mar. 2015 – Present Full-time English > Simplified Chinese freelance translator
• Jul. 2010 – Mar. 2015 EC Innovations Full-time translator, editor & project manager
• 2006 – 2010 Yunnan Agriculture University
Bachelor Degree of Biochemistry
WORKING LANGUAGE PAIRS
• English to Simplified Chinese (Mandarin)
• Simplified Chinese to English
• Marketing (Hotel / Tourism / Real Estate): App Annie, The Ritz-Carlton, Hilton, One the Waterfront, Best Western International, The Sukhothai Bangkok, Smart Destinations, Costa Cruises, London Priority Club, Jetradar, Cap St Georges, Boscolo Collection, Starwood Hotels, Palazzo Parigi
• Medical (Healthcare/Pharmaceuticals/Laboratory Instrument): Phillips Healthcare, Siemens Healthcare, Carestream Health, Elekta, AstraZeneca, TAK, Hitachi, Cochlear, Rocket Medical, Merck Millipore, Epocal, Waters, Agilent, Thermo Fisher Scientific, PerkinElmer, Chinese Medical Association, OneSight
Keywords: Chinese IT translation, Chinese IT translator, IT Chinese translation, IT Chinese translator, Chinese software translation, Chinese software translator, software Chinese translation, software Chinese translator, Chinese technical translation, Chinese technical translator, technical Chinese translation, technical Chinese translator, Mandarin financial translation, Mandarin financial translator, financial Mandarin translation, financial Mandarin translator, Mandarin business translation, Mandarin business translator, business Mandarin translation, business Mandarin translator, English Chinese translation, English Chinese translator, Chinese English translation, Chinese English translator, English Mandarin translation, English Mandarin translator, Mandarin English translation, Mandarin English translator, English Chinese business translation, English Chinese business translator, Chinese English business translation, Chinese English business translator, English Mandarin business translation, English Mandarin business translator, Mandarin English business translation, Mandarin English business translator, English Chinese financial translation, English Chinese financial translator, Chinese English financial translation, Chinese English financial translator, English Mandarin financial translation, English Mandarin financial translator, Mandarin English financial translation, Mandarin English financial translator, English Mandarin technical translation, English Mandarin technical translator, technical English Mandarin translation, technical English Mandarin translator, Mandarin English technical translation, Mandarin English technical translator, technical Mandarin English translation, technical Mandarin English translator, Chinese translation, Chinese translator, Mandarin translation, Mandarin translator, Simplified Chinese translation, Simplified Chinese translator, Beijing, China, native Chinese translator, native Chinese speaker, native Mandarin translator, native Mandarin speaker, native simplified Chinese translator, native simplified Chinese speaker, English-Chinese translation, English-Chinese translator, English-Mandarin translation, English-Mandarin translator, Chinese-English translation, Chinese-English translator, Mandarin-English translation, Mandarin-English translator, top translator, quality translation, fast turnaround, quick delivery, double-checking, cross-checking, strong credential, client reference, Software localiztion, Website localization, Product manual localization, Technical Translation, English to Chinese translator, Chinese translator, Chinese translation, Simplified Chinese, Information Technology, IT, Computer, Hardware, Software, System, Network, Internet, e-Commerce, Micrososft, HP, Macfee, Autodesk, technical, documentation, manuals, automotive translation, automobile translation, business translation, finance translation, contract translation, Trados, SDLX, Catalyst, RC-WinTrans, Passolo, WorldServer DeskTop Workbench, Idiom WorldServer, FrameMaker, China