• !spacings on a torus
    While in Brussels last week I noticed an interesting question on X validated that I considered in the train back home and then more over the weekend. This is a question about spacings, namely how long on average does it take to cover an interval of l…
    - 18 hours ago 21 Mar 18, 11:18pm -
  • R Tip: Break up Function Nesting for Legibility
    There are a number of easy ways to avoid illegible code nesting problems in R. In this R tip we will expand upon the above statement with a simple example. At some point it becomes illegible and undesirable to compose operations by nesting them, such…
    - 1 day ago 21 Mar 18, 2:31pm -
  • Automate R processes
    Last week we updated the cronR R package and released it to CRAN allowing you to schedule any R code on whichever timepoint you like. The package was updated in order to comply to more stricter CRAN policies regarding writing to folders. Along the li…
    - 1 day ago 21 Mar 18, 12:23pm -
  • Regression Analysis Essentials For Machine Learning
    Regression analysis consists of a set of machine learning methods that allow us to predict a continuous outcome variable (y) based on the value of one or multiple predictor variables (x).Briefly, the goal of regression model is to build a mathematic…
    - 1 day ago 21 Mar 18, 6:35am -
  • ggplot2: How Geoms & Aesthetics ≈ Whipped Cream
    In this post I have a few goals: 1. Become (re-)familiar with available geoms 2. Become (re-)familiar with aesthetic mappings in geoms (stroke who knew?) 3. Answer these questions: How often do various geoms appear and how often do they … Continue…
    - 1 day ago 21 Mar 18, 6:08am -

High Scalability



  • !AI, Machine Learning and Data Science Roundup: March 2018
    This is the first edition of a monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements, applications and events I've noted over the pas…
    - 21 hours ago 21 Mar 18, 8:45pm -
  • R and Docker
    If you regularly have to deal with specific versions of R, or different package combinations, or getting R set up to work with other databases or applications then, well, it can be a pain. You could dedicate a special machine for each configuration y…
    - 2 days ago 20 Mar 18, 10:39pm -
  • Because it's Friday: Email a tree
    The City of Melbourne has collected data on the more than 70,000 trees in the urban forest of this Australian metropolis. The data include the species, the health status of the tree and its life expectancy, all shown on a lovely map. As you can see f…
    - 6 days ago 16 Mar 18, 9:58pm -
  • R 3.4.4 released
    R 3.4.4 has been released, and binaries for Windows, Mac, Linux and now available for download on CRAN. This update (codenamed "Someone to Lean On" — likely a Peanuts reference, though I couldn't find which one with a quick search) is a minor bugfi…
    - 7 days ago 15 Mar 18, 7:42pm -
  • In case you missed it: February 2018 roundup
    In case you missed them, here are some articles from February of particular interest to R users. The R Consortium opens a new round of grant applications for R-related user groups and projects, and has issued US$0.5M in grants to date for R-related p…
    - 8 days ago 14 Mar 18, 9:40pm -

Data Analytics and R

  • !If you did not already know
    Riemann-Theta Boltzmann Machine A general Boltzmann machine with continuous visible and discrete integer valued hidden states is introduced. Under mild …Continue reading →
    - 11 hours ago 22 Mar 18, 6:07am -
  • !Whats new on arXiv
    Learning non-Gaussian Time Series using the Box-Cox Gaussian Process Gaussian processes (GPs) are Bayesian nonparametric generative models that provide interpretability …Continue reading →
    - 13 hours ago 22 Mar 18, 4:05am -
  • !Book Memo: “Introduction to HPC with MPI for Data Science”
    This gentle introduction to High Performance Computing (HPC) for Data Science using the Message Passing Interface (MPI) standard has been …Continue reading →
    - 15 hours ago 22 Mar 18, 2:03am -
  • !Book Memo: “Probability and Statistics for Computer Science”
    This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background …Continue reading →
    - 15 hours ago 22 Mar 18, 2:03am -
  • !Distilled News
    The Machine Learning Reproducibility Crisis I was recently chatting to a friend whose startup’s machine learning models were so disorganized …Continue reading →
    - 17 hours ago 22 Mar 18, 12:01am -

Google Cloud

The HortonWorks Blog

  • Current Challenges in Healthcare
    Guest blog written by Bendi Sowjanya, a microbiologist and technologist at B3DS Health issues affect all populations globally but the treatment and prevention of these issues varies widely depending on geographic location.  The effectiveness of heal…
    - 1 day ago 21 Mar 18, 2:00pm -
  • Enabling Mission-Critical Data to Feed Clinical Decisions in Healthcare
    We’re excited to announce that we’ll be hosting an upcoming webinar with Clearsense LLC on March 29th! Clearsense is a smart data organization based in Jacksonville, Florida that is re-imagining and simplifying data analytics to help healthcare…
    - 3 days ago 19 Mar 18, 2:00pm -
  • Hortonworks Operational Services: Embark on Your Big Data Journey with Confidence
    Data-driven insights have been hailed as a differentiator, one that may render companies obsolete if they are unable to take advantage of this trend. With this realization, business and IT leaders are keen to get started quickly in order to exploit t…
    - 3 days ago 19 Mar 18, 1:00pm -
  • Manufacturing Industry Use Cases, Challenges, and Strategies for Dealing with Huge Data Volumes
    Last month, we held the most recent Manufacturing and Transportation Customer Community call. These calls occur a couple times per quarter and act as an opportunity for leaders in both the manufacturing and transportation industries to have a roundta…
    - 6 days ago 16 Mar 18, 5:22pm -
  • IBM and Hortonworks Partnership Highlighted at IBM THINK 2018!
    Hortonworks and IBM’s partnership has brought multiple joint solutions for Global Data Management to the market. From HDP on Power Systems and Spectrum Scale Storage which provides customers fast access to data and a cost-effective platform for run…
    - 6 days ago 16 Mar 18, 2:00pm -


  • !My review of Microsoft’s data science virtual machine (DSVM) for deep learning
    Over the past few months, I’ve been using Microsoft’s Ubuntu deep learning and data science virtual machine (DSVM) for a few projects I’m working on here at PyImageSearch. At first, I was a bit hesitant (and perhaps even a bit resistant) to giv…
    - 22 hours ago 21 Mar 18, 7:03pm -
  • Reading barcodes with Python and OpenMV
    What if I said that there’s a camera that: Is low cost at $65. Runs MicroPython. And can be expanded with shields just like an Arduino/RPi. Meet OpenMV! I met Kwabena Agyeman, the founder of OpenMV, during the PyImageSearch Gurus Kickstarter campai…
    - 3 days ago 19 Mar 18, 2:00pm -
  • Python, argparse, and command line arguments
    Today we are going to discuss a fundamental developer, engineer, and computer scientist skill — command line arguments. Specifically, we’ll be discussing: What are command line arguments Why we use command line arguments How to parse command lin…
    - 10 days ago 12 Mar 18, 2:00pm -
  • The 7 best deep learning books you should be reading right now
    In today’s post I’m going to share with you the 7 best deep learning books (in no particular order) I have come across and would personally recommend you read. Some of these deep learning books are heavily theoretical, focusing on the mathematics…
    - 17 days ago 5 Mar 18, 3:00pm -
  • Face detection with OpenCV and deep learning
    Today I’m going to share a little known secret with you regarding the OpenCV library: You can perform fast, accurate face detection with OpenCV using a pre-trained deep learning face detector model shipped with the library. You may already know tha…
    - 24 days ago 26 Feb 18, 3:00pm -

Walking Randomly

  • Research Software Engineer: A New Career Track?
    Along with fellow Fellow Chris Richardson, we wrote an article over at Siam News about the emerging Research Software Engineering profession.  Head over to Research Software Engineer: A New Career Track? to check it out. If this has whetted your ap…
    - 20 days ago 2 Mar 18, 7:12am -
  • Strange MATLAB performance issue on Microsoft Azure F72s_v2 instances
    I’m working on some MATLAB code at the moment that I’ve managed to reduce down to a bunch of implicitly parallel functions. This is nice because the data that we’ll eventually throw at it will be represented as a lot of huge matrices.  As such…
    - 22 days ago 1 Mar 18, 5:06am -
  • Creating a temporary, customised, multi-user HPC cluster for teaching using Amazon AWS and Alces Flight
    In a previous blog post, I told the story of how I used Amazon AWS and AlcesFlight to create a temporary multi-user HPC cluster for use in a training course.  Here are the details of how I actually did it. Note that I have only ever used this config…
    - 29 days ago 21 Feb 18, 12:22pm -
  • Bespoke High Performance Computing Clusters in the Cloud with Alces Flight
    I needed a supercomputer…..quickly! One of the things that we do in Sheffield’s Research Software Engineering Group is host training courses delivered by external providers.  One such course is on parallel programming using MPI for which we t…
    - 29 days ago 21 Feb 18, 9:24am -
  • Meltdown, Spectre and High Performance Computing
    The Meltdown bug which affects most modern CPUs has been called by some ‘The worst ever CPU bug’. Accessible explanations about what the Meltdown bug actually is are available here and here. Software patches have been made available but some peop…
    - 40 days ago 10 Feb 18, 9:47am -