• !H2O Benchmark for CSV Import
    The importFile() function in H2O is extremely efficient due to the parallel reading. The benchmark comparison below shows that it is comparable to the read.df() in SparkR and significantly faster than the generic read.csv().
    - 3 hours ago 26 Jun 17, 5:24am -
  • !Data Visualization with googleVis exercises part 4
    Adding Features to your Charts We saw in the previous charts some basic and well-known types of charts that googleVis offers to users. Before continuing with other, more sophisticated charts in the next parts we are going to “dig a little deeper”…
    - 17 hours ago 25 Jun 17, 4:00pm -
  • !R Weekly Bulletin Vol – XII
    This week’s R bulletin will cover topics on how to resolve some common errors in R. Hope you like this R weekly bulletin. Enjoy reading! Shortcut Keys 1. Find and Replace – Ctrl+F 2. Find Next – F3 3. Find Previous – Shift+F3 Problem Solving…
    - 23 hours ago 25 Jun 17, 9:57am -
  • Using Tweedie Parameter to Identify Distributions
    In the development of operational loss models, it is important to identify which distribution should be used to model operational risk measures, e.g. frequency and severity. For instance, why should we use the Gamma distribution instead of the Invers…
    - 1 day ago 25 Jun 17, 2:55am -
  • Using tidycensus and leaflet to map Census data
    Recently, I have been following the development and release of Kyle Walker’s tidycensus package. I have been filled with amazement, delight, and well, perhaps another feeling…There should be a word for “the regret felt when an R 📦, which…
    - 2 days ago 24 Jun 17, 12:00am -

High Scalability

  • Gone Fishin'
    Well, not exactly Fishin', but I'll be on a month long vacation starting today. I won't be posting (much) new content, so we'll all have a break. Disappointing, I know. Please use this time for quiet contemplation and other inappropriate activities…
    - 24 days ago 2 Jun 17, 3:12pm -
  • Stuff The Internet Says On Scalability For May 26th, 2017
    Hey, it's HighScalability time:  Sport imitating tech. Cloud Computing chases down Classic Empire to win...the Preakness. (Daily News)If you like this sort of Stuff then please support me on Patreon.42%: increase US wireless traffic since…
    - 31 days ago 26 May 17, 3:56pm -
  • Sponsored Post: Etleap, Pier 1, Aerospike, Loupe, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp
    Who's Hiring? Pier 1 Imports is looking for an amazing Sr. Website Engineer to join our growing team!  Our customer continues to evolve the way she prefers to shop, speak to, and engage with us at Pier 1 Imports.  Driving us to innovate more way…
    - 34 days ago 23 May 17, 3:56pm -
  • Stuff The Internet Says On Scalability For May 19th, 2017
    Hey, it's HighScalability time:  Who wouldn't want to tour the Garden of Mathematical Sciences with Plato as their guide?If you like this sort of Stuff then please support me on Patreon.2 billion: Android users; 1,000: cloud TPUs freel…
    - 38 days ago 19 May 17, 3:56pm -
  • Is Serverless the New Visual Basic?
    With Serverless hiring less experienced developers can work out better than hiring experienced cloud developers. That's an interesting point I haven't heard before and it was made by Paul Johnston, CTO of movivo, in The ServerlessCast #6 - Event-D…
    - 42 days ago 15 May 17, 4:35pm -


  • The tips and tricks I used to succeed on Kaggle
    I learned machine learning through competing in Kaggle competitions. I entered my first competitions in 2011, with almost no data science knowledge. I soon ended up in fifth place out of a hundred or so in a stock trading competition. Over the…
    - 4 days ago 22 Jun 17, 3:00pm -
  • What's the difference between a data analyst, scientist and engineer?
    Data is increasingly shaping the systems that we interact with every day. Whether you’re using Siri, searching Google, or browsing your Facebook feed, you’re consuming the results of data analysis. Given its transformational ability, it’s no wo…
    - 11 days ago 15 Jun 17, 8:00am -
  • SQL Basics: Working with Databases
    SQL, pronounced “sequel” (or ess-cue-ell, if you prefer), is a very important tool for data scientists to have in their repertoire. You may well have heard the name and wondered what it is, how it works and whether you should learn it. To put it…
    - 48 days ago 9 May 17, 11:00am -
  • Getting Started with Kaggle: House Prices Competition
    Founded in 2010, Kaggle is a Data Science platform where users can share, collaborate, and compete. One key feature of Kaggle is “Competitions”, which offers users the ability to practice on real world data and to test their skills with, and agai…
    - 52 days ago 5 May 17, 6:00am -
  • What's New in v1.19: Multiscreen, Concepts, Dataset Preview and More!
    Our version 1.19 release includes new features designed to improve your learning experience. The first thing you may notice is a new look. We’ve made some design tweaks, including a new mission-text font which we think you’ll agree makes e…
    - 55 days ago 2 May 17, 3:00pm -


  • Because it's Friday: Mario in the Park
    I got my first chance to use HoloLens just a couple of weeks ago. It was pretty amazing to see a virtual wind turbine appear in the room with me, and to be able to walk around it and see how it was performing. But here's a much more fun application o…
    - 2 days ago 23 Jun 17, 11:26pm -
  • The R community is one of R's best features
    R is incredible software for statistics and data science. But while the bits and bytes of software are an essential component of its usefulness, software needs a community to be successful. And that's an area where R really shines, as Shannon Ellis e…
    - 3 days ago 23 Jun 17, 6:06pm -
  • Interactive R visuals in Power BI
    Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it…
    - 4 days ago 22 Jun 17, 7:21pm -
  • Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support
    The Windows edition of the Data Science Virtual Machine (DSVM), the all-in-one virtual machine image with a wide-collection of open-source and Microsoft data science tools, has been updated to the Windows Server 2016 platform. This update brings buil…
    - 4 days ago 21 Jun 17, 8:40pm -
  • R leads, Python gains in 2017 Burtch Works Survey
    For the past four years, recruiting firm Burtch Works has conducted a simple survey of data scientists with just one question: "Which do you prefer to use — SAS, R or Python". The results for this year's survey of 1,046 respondents are in: R: 40% (…
    - 5 days ago 20 Jun 17, 9:25pm -

Data Analytics and R

  • !R Packages worth a look
    Tukeys Trend Test via Multiple Marginal Models (tukeytrend)Provides wrapper functions to the multiple marginal model function mmm() of package ‘multcomp’ …Continue reading →
    - 2 hours ago 26 Jun 17, 6:07am -
  • !If you did not already know
    Speech Analytics Speech analytics is the process of analyzing recorded calls to gather information, brings structure to customer interactions and …Continue reading →
    - 6 hours ago 26 Jun 17, 2:03am -
  • !Book Memo: “Guide to Computational Modelling for Decision Processes”
    Theory, Algorithms, Techniques and Applications This interdisciplinary reference and guide provides an introduction to modeling methodologies and models which form …Continue reading →
    - 9 hours ago 26 Jun 17, 12:01am -
  • !Magister Dixit
    “Know-Evolve: Deep Reasoning in Temporal Knowledge Graphs” Knowledge Graphs are important tools to model multi-relational data that serves as information …Continue reading →
    - 10 hours ago 25 Jun 17, 10:23pm -
  • !Document worth reading: “Know-Evolve: Deep Reasoning in Temporal Knowledge Graphs”
    Knowledge Graphs are important tools to model multi-relational data that serves as information pool for various applications. Traditionally, these graphs …Continue reading →
    - 14 hours ago 25 Jun 17, 6:19pm -

Google Cloud

The HortonWorks Blog


  • Image Difference with OpenCV and Python
    In a previous PyImageSearch blog post, I detailed how to compare two images with Python using the Structural Similarity Index (SSIM). Using this method, we were able to easily determine if two images were identical or had differences due to slight im…
    - 7 days ago 19 Jun 17, 2:00pm -
  • PyImageSearch Gurus member spotlight: Saideep Talari
    In today’s blog post, I interview Saideep Talari, a PyImageSearch Gurus graduate who was recently hired as a computer vision engineer at a startup in India. Saideep’s story holds a special place in my heart as it’s so incredibly sincere, gen…
    - 14 days ago 12 Jun 17, 2:00pm -
  • Computing image “colorfulness” with OpenCV and Python
    Today’s blog post is inspired by a question I received from a PyImageSearch reader on Twitter, @makingyouthink. Paraphrasing the tweets myself and @makingyouthink exchanged, the question was: Have you ever seen a Python implementation of Measuring…
    - 21 days ago 5 Jun 17, 2:00pm -
  • Montages with OpenCV
    Today’s blog post is inspired by an email I received from PyImageSearch reader, Brian. Brian asks: Hi Adrian, I’m really enjoying the PyImageSearch blog. I found your site a few days ago and I’ve been hooked on your tutorials ever since. I foll…
    - 28 days ago 29 May 17, 2:00pm -
  • Face Alignment with OpenCV and Python
    Continuing our series of blog posts on facial landmarks, today we are going to discuss face alignment, the process of: Identifying the geometric structure of faces in digital images. Attempting to obtain a canonical alignment of the face based on tra…
    - 35 days ago 22 May 17, 2:00pm -

Walking Randomly

  • HPC-centric Research Software Engineering role within RSE Sheffield
    A job opportunity within the RSE Sheffield group is available under the job title of “Research Software Engineer in High Performance Computing (HPC) enabled Multi-Scale Modelling”. This is a EU funded position with a focus on supporting the biome…
    - 33 days ago 24 May 17, 6:43am -
  • Faster transpose matrix multiplication in R
    I’m working on optimising some R code written by a researcher at University of Sheffield and its very much a war of attrition! There’s no easily optimisable hotspot and there’s no obvious way to leverage parallelism. Progress is being made by s…
    - 34 days ago 23 May 17, 9:42am -
  • How powerful are Microsoft Azure’s free Jupyter notebooks?
    For a while now, Microsoft have provided a free Jupyter Notebook service on Microsoft Azure. At the moment they provide compute kernels for Python, R and F# providing up to 4Gb of memory per session. Anyone with a Microsoft account can upload their o…
    - 42 days ago 15 May 17, 7:05am -
  • Research Software Engineering: State of the Nation 2017
    I am a co-investigator on an EPSRC-funded grant called the RSE-N (Research Software Engineering Network), the aim of which is to co-ordinate various Research Software Engineering activities nationally.  One of the outputs of this work is a ‘State…
    - 77 days ago 10 Apr 17, 3:40pm -
  • High Performance Computing – There’s plenty of room at the bottom
    UK to launch 6 major HPC centres Tomorrow, I’ll be attending the launch event for the UK’s new HPC centres and have been asked to deliver a short talk as part of the program. As someone who paddles in the shallow-end of the HPC pool I find this b…
    - 89 days ago 29 Mar 17, 8:13pm -