High Scalability



  • !Microsoft R Open 3.4.3 now available
    Microsoft R Open (MRO), Microsoft's enhanced distribution of open source R, has been upgraded to version 3.4.3 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to the latest R (version 3.4.3) a…
    - 14 hours ago 17 Jan 18, 11:45pm -
  • A simple way to set up a SparklyR cluster on Azure
    The SparklyR package from RStudio provides a high-level interface to Spark from R. This means you can create R objects that point to data frames stored in the Spark cluster and apply some familiar R paradigms (like dplyr) to the data, all the while l…
    - 2 days ago 16 Jan 18, 11:02pm -
  • Because it's Friday: Kite Ballet
    With a tip 'o the hat to Buck, enjoy the acrobatics of these kites from a performance in Oregon in 2012, set to Bohemian Rhapsody. Even after watching it a few times I still don't get how the lines don't get tangled up. That's all from the blog for t…
    - 6 days ago 12 Jan 18, 9:00pm -
  • Services and tools for building intelligent R applications in the cloud
    by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) As an in-memory application, R is sometimes thought to be constrained in performance or scalability for enterprise-grade applications. But by deploying…
    - 6 days ago 12 Jan 18, 5:30pm -
  • How to implement neural networks in R
    If you've ever wondered how neural networks work behind the scenes, check out this guide to implementing neural networks in scratch with R, by David Selby. You may be surprised how with just a little linear algebra and a few R functions, you can trai…
    - 7 days ago 12 Jan 18, 12:33am -

Data Analytics and R

  • !Magister Dixit
    “Data warehouses have not been able to keep up with business demands for new sources of information, new types of …Continue reading →
    - 8 hours ago 18 Jan 18, 6:07am -
  • !Document worth reading: “A review of change point detection methods”
    In this work, methods to detect one or several change points in multivariate time series are reviewed. They include retrospective …Continue reading →
    - 10 hours ago 18 Jan 18, 4:05am -
  • !If you did not already know
    Markov Brains Markov Brains are a class of evolvable artificial neural networks (ANN). They differ from conventional ANNs in many …Continue reading →
    - 12 hours ago 18 Jan 18, 2:03am -
  • !Distilled News
    Online Learning Guide with Text Classification using Vowpal Wabbit (VW) A large number of E-Commerce and tech companies rely on …Continue reading →
    - 14 hours ago 18 Jan 18, 12:01am -
  • !R Packages worth a look
    Density Estimation and Random Number Generation with Distribution Element Trees (detpack)Density estimation for possibly large data sets and conditional/unconditional random …Continue reading →
    - 16 hours ago 17 Jan 18, 10:23pm -

Google Cloud

  • Problem-solving with ML: automatic document classification
    By Ahmed Kachkach, Software EngineerText documents are one of the richest sources of data for businesses: whether in the shape of customer support tickets, emails, technical documents, user reviews or news articles, they all contain valuable inform…
    - 8 days ago 10 Jan 18, 9:00am -
  • Improving the efficiency of your helpdesk with serverless machine learning
    By Matthieu Mayran, Cloud Solutions ArchitectGreat customer service builds trust, inspires brand loyalty, and earns repeat business. So it’s no surprise that, according to Deloitte, close to 90 percent of organizations name improving the quality…
    - 22 days ago 28 Dec 17, 12:00am -
  • Busting 12 myths about BigQuery
    By Fereshteh Mahvar, Cloud Solutions Architect, and Ryan McDowell, Strategic Cloud EngineerNot long ago, Forrester Research named Google Cloud the leader in their report, The Forrester Wave™: Insight Platforms-As-A-Service, Q3 2017 — and we coul…
    - 28 days ago 22 Dec 17, 12:00am -
  • New in TensorFlow 1.4: converting a Keras model to a TensorFlow Estimator
    By Sara Robinson and Josh Gordon, Developer AdvocatesTensorFlow’s 1.4 release brings many new features — one of our favorites is support for converting a Keras model to a TensorFlow Estimator via the model_to_estimator() method.Why would you…
    - 32 days ago 18 Dec 17, 12:00am -
  • Bringing Cloud ML Engine to more developers with online prediction features and reduced prices
    By Justin Lawyer, Senior Product ManagerGood news — we’ve adjusted our prices for, and added new features to Cloud Machine Learning Engine to help you do more with ML. Here’s what you can expect. We’ve reduced our prices.This will make…
    - 36 days ago 14 Dec 17, 12:00am -

The HortonWorks Blog

  • !New Series: Women @ Hortonworks
    Happy New Year! We have been looking back at some of the great achievements of the past year and one thing which really stands out is the important contributions we have had from Women at Hortonworks. We have women engineers contributing to Hortonwor…
    - 15 hours ago 17 Jan 18, 11:21pm -
  • 2017 Year In Review
    A new year is upon us, bringing refreshed optimism and new resolutions for many looking to the 12 months ahead. I’ve spent the holiday break reflecting on our past year and wanted to take a moment to share with you the tremendous confidence I have…
    - 8 days ago 10 Jan 18, 7:33pm -
  • 4 essential steps for managing sensitive data in your data lake
    By: Balaji Ganesan, CEO of Privacera How to leverage data discovery, control, anonymization and monitoring using Privacera, Apache Atlas and Ranger Data is growing in data lakes, so are security and compliance risks. These risks stem from storing and…
    - 13 days ago 5 Jan 18, 6:23pm -
  • Applying Big Data Streaming Analytics in the Real World
    IoT, the Internet of Things, has been a buzzword for the past five years. Literally everyone across all industries – business executives, line of business owners, operation staff, mechanical engineers, even retail marketers – has been eyeing the…
    - 14 days ago 4 Jan 18, 5:27pm -
  • YARN – The Capacity Scheduler
    Understanding the basic functions of the YARN Capacity Scheduler is a concept I deal with typically across all kinds of deployments. While Capacity Management has many facets from sharing, chargeback, and forecasting the focus of this blog will be on…
    - 28 days ago 21 Dec 17, 5:06pm -


  • Why I started a computer vision and deep learning conference
    The vast majority of blog posts here on PyImageSearch are very hands-on and follow a particular pattern: We explore a problem. We write some code to solve the problem. We look at the results, explaining what worked went well, what didn’t, and how w…
    - 3 days ago 15 Jan 18, 3:00pm -
  • PyImageConf 2018: The practical, hands-on computer vision and deep learning conference
    Today I’m pleased to announce the finalized details to an event I’ve been working on behind the scenes for quite some time: PyImageConf 2018: The practical, hands-on computer vision conference Imagine taking the practical, hands-on teaching style…
    - 10 days ago 8 Jan 18, 3:00pm -
  • Taking screenshots with OpenCV and Python
    Happy New Year! It’s now officially 2018…which also means that PyImageSearch is (almost) four years old! I published the very first blog post on Monday, January 12th 2014. Since then over 230 posts have been published, along with two books and a…
    - 17 days ago 1 Jan 18, 3:00pm -
  • How to plot accuracy and loss with mxnet
    When it comes to high-performance deep learning on multiple GPUs (and not to mention, multiple machines) I tend to use the mxnet library. Part of the Apache Incubator, mxnet is a flexible, efficient, and scalable library for deep learning (Amazon eve…
    - 24 days ago 25 Dec 17, 3:00pm -
  • Keras and deep learning on the Raspberry Pi
    Today’s blog post is the most fun I’ve EVER had writing a PyImageSearch tutorial. It has everything we have been discussing the past few weeks, including: Deep learning Raspberry Pis 3D Christmas trees References to HBO’s Silicon Valley “Not…
    - 31 days ago 18 Dec 17, 3:00pm -

Walking Randomly

  • Research Councils UK Cloud Workshop
    The RCUK Cloud Working Group are hosting their 3rd free annual workshop in January 2018 and I’ll be attending.  At the time of writing, there are still places left and you can sign up at https://www.eventbrite.co.uk/e/research-councils-uk-cloud-w…
    - 51 days ago 28 Nov 17, 2:18pm -
  • The Sheffield Research Software Engineering blog
    Taps microphone: ‘Is this still on?’ I’ve been blogging on here for over 10 years and this article marks the end of the largest gap in posting that I’ve ever done — almost 6 months!  A couple of people have asked me if I’ve given up on W…
    - 62 days ago 17 Nov 17, 2:33pm -
  • HPC-centric Research Software Engineering role within RSE Sheffield
    A job opportunity within the RSE Sheffield group is available under the job title of “Research Software Engineer in High Performance Computing (HPC) enabled Multi-Scale Modelling”. This is a EU funded position with a focus on supporting the biome…
    - 24 May 17, 6:43am -
  • Faster transpose matrix multiplication in R
    I’m working on optimising some R code written by a researcher at University of Sheffield and its very much a war of attrition! There’s no easily optimisable hotspot and there’s no obvious way to leverage parallelism. Progress is being made by s…
    - 23 May 17, 9:42am -
  • How powerful are Microsoft Azure’s free Jupyter notebooks?
    For a while now, Microsoft have provided a free Jupyter Notebook service on Microsoft Azure. At the moment they provide compute kernels for Python, R and F# providing up to 4Gb of memory per session. Anyone with a Microsoft account can upload their o…
    - 15 May 17, 7:05am -