• !Association rules using FPGrowth in Spark MLlib through SparklyR
    Introduction Market Basket Analysis or association rules mining can be a very useful technique to gain insights in transactional data sets, and it can be useful for product recommendation. The classical example is data in a supermarket. For each cust…
    - 6 hours ago 23 Nov 17, 7:55pm -
  • !R live class | Professional R Programming | Nov 29-30 Milan
    If you wish to move forward from being a R user to become a R developer, it is time to take your programming skills to the next level. This course will give you an inner perspective of R working mechanisms, as well as tools for addressing your code's…
    - 7 hours ago 23 Nov 17, 6:13pm -
  • !EARL Boston round up
    Now we’ve recovered from over indulging in Boston’s culinary delights, we’re ready to share our highlights from this year’s EARL Boston Conference. Day 1 highli...
    - 9 hours ago 23 Nov 17, 4:14pm -
  • !Happy Thanksgiving!
    Today is Thanksgiving Day here in the US, so we're taking the rest of the week off to enjoy the time with family. Even if you don't celebrate Thanksgiving, today is still an excellent day to give thanks to the volunteers who have contributed to the R…
    - 10 hours ago 23 Nov 17, 4:00pm -
  • Last day: Data science courses in R (/python/and others) for $10 at Udemy (Black Friday sale)
    Udemy is offering readers of R-bloggers access to its global online learning marketplace for only $10 per course! This deal (offering over 50%-90% discount) is for hundreds of their courses – including many R-Programming, data science, machine…
    - 1 day ago 22 Nov 17, 6:05pm -

High Scalability


  • Setting Up the PyData Stack on Windows
    The speed of modern electronic devices allows us to crunch large amounts of data at home. However, these devices require the right software in order to reach peak performance. Luckily, it’s now easier than ever to set up your own data science envir…
    - 2 days ago 22 Nov 17, 7:00am -
  • How to Start a Data Science Meetup
    Meetups are great tools, you’re able to meet people in the field, keep up on industry news, and learn how to ‘talk the talk.’ Before I started attending meetups I wasn’t aware of just how much I didn’t know and still had to learn, let alone…
    - 8 days ago 16 Nov 17, 7:30am -
  • Kaggle Fundamentals: The Titanic Competition
    Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it’s the most accurate on a particular data set. Kaggle is a fun way to practice your…
    - 30 days ago 25 Oct 17, 7:00am -
  • Five Essential Traits of a Data Scientist
    Trillions of pixels have been deployed to answer the question ‘What makes a good data scientist?’ Most of these articles have focused on skills and tools of data science while almost none have discussed the personalities that make good, even grea…
    - 35 days ago 20 Oct 17, 7:00am -
  • SQL Fundamentals
    The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static…
    - 49 days ago 6 Oct 17, 7:00am -


  • !Happy Thanksgiving!
    Today is Thanksgiving Day here in the US, so we're taking the rest of the week off to enjoy the time with family. Even if you don't celebrate Thanksgiving, today is still an excellent day to give thanks to the volunteers who have contributed to the R…
    - 10 hours ago 23 Nov 17, 4:00pm -
  • Learnings from 5 months of R-Ladies Chicago (Part 1)
    by Angela Li, founder and organizer of R-Ladies Chicago. This article also appears on Angela's personal blog. It’s been a few months since I launched R-Ladies Chicago, so I thought I’d sit down and write up some things that I’ve learned in the…
    - 1 day ago 22 Nov 17, 5:38pm -
  • Scale up your parallel R workloads with containers and doAzureParallel
    by JS Tan (Program Manager, Microsoft) The R language is by and far the most popular statistical language, and has seen massive adoption in both academia and industry. In our new data-centric economy, the models and algorithms that data scientists bu…
    - 2 days ago 21 Nov 17, 4:30pm -
  • R charts in a Tweet
    Twitter recently doubled the maximum length of a tweet to 280 characters, and while all users now have access to longer tweets, few have taken advantage of the opportunity. Bob Rudis used the rtweet package to analyze tweets sent with the #rstats has…
    - 3 days ago 20 Nov 17, 8:25pm -
  • Because it's Friday: Better living through chemistry
    This video is a compilation of some spectacular chemical reactions, with a few physics demonstrations thrown in for good measure. (But hey, chemistry is just applied physics, right?). That's all from us here at the blog for this week. Have a great we…
    - 6 days ago 17 Nov 17, 10:20pm -

Data Analytics and R

  • !Whats new on arXiv
    An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery While most classical approaches to Granger causality detection …Continue reading →
    - 2 hours ago 24 Nov 17, 12:01am -
  • !R Packages worth a look
    Preprocessing Algorithms for Imbalanced Datasets (imbalance)Algorithms to treat imbalanced datasets. Imbalanced datasets usually damage the performance of the classifiers. Thus, …Continue reading →
    - 3 hours ago 23 Nov 17, 10:23pm -
  • !Book Memo: “Introduction to High-Dimensional Statistics”
    Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands …Continue reading →
    - 5 hours ago 23 Nov 17, 8:21pm -
  • !Distilled News
    Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research Given the recent explosion of interest in …Continue reading →
    - 7 hours ago 23 Nov 17, 6:19pm -
  • !Book Memo: “Handbook of Metaheuristics”
    … an excellent book if you want to learn about a number of individual metaheuristics.’ (U. Aickelin, Journal of the …Continue reading →
    - 19 hours ago 23 Nov 17, 6:07am -

Google Cloud

  • Cloud OnAir shows you how to get ML-derived insight out of your data
    By Rajen Sheth, Director of Product Management, and William Vambenepe, Group Product ManagerLet’s face it, most businesses have plenty of data. And they know machine learning and data driven analysis can help them unlock valuable insights. But i…
    - 4 days ago 20 Nov 17, 12:00am -
  • Announcing BigQuery Data Transfer Service general availability
    By Matthew Tai, Product Manager, Google CloudAt Cloud Next, we introduced the Google BigQuery Data Transfer Service, to automate data movement from SaaS applications to BigQuery on a scheduled, managed basis. BigQuery Data Transfer Service currentl…
    - 8 days ago 16 Nov 17, 12:00am -
  • How to integrate Dialogflow with Chatbase for easier bot analytics
    By Viknesh Krishnan, Chatbase Software Engineer, and Ksenia Gelfenbeyn, Dialogflow Product ManagerDialogflow (formerly API.AI) is the end-to-end platform by Google for building great cross-platform conversational experiences. One important aspect t…
    - 8 days ago 16 Nov 17, 12:00am -
  • Automating ML and IoT with cloud-based image rendering, training, and device delivery
    By Preston Holmes and Adrian Graham, Solution Architects, Google Cloud Platform Since its release as an open source machine learning framework, TensorFlow has become a popular ecosystem for deep neural network development. With it, developers can…
    - 10 days ago 14 Nov 17, 12:00am -
  • Using Apache Beam and Cloud Dataflow to integrate SAP HANA and BigQuery
    By Babu Prasad, Technical Lead, Big Data and Mark Shalda, Technical Program ManagerSAP HANA is an in-memory columnar database that you can use either as a persistence layer for applications in the SAP ecosystem, or as an independent enterprise data…
    - 11 days ago 13 Nov 17, 12:00am -

The HortonWorks Blog

  • IoT and Data Science – A Trucking Demo on DSX Local with Apache NiFi
    IBM’s Data Science Experience (DSX) comes in multiple flavors: cloud, desktop, and local. In this post we cover an IoT trucking demo on DSX local, i.e. running on top of Hortonworks Data Platform (HDP). We train and deploy a model, and then we use…
    - 1 day ago 22 Nov 17, 10:59pm -
  • Big Data London – UK readies for Global Data-Driven Upheaval
    Last week was the Big Data London conference, billed as the largest analytics event in the UK.  The theme for the week was a call to arms for UK businesses to ready themselves for the fourth industrial revolution.  Britain was the birthplace of the…
    - 2 days ago 22 Nov 17, 1:35am -
  • Addressing the Data Tipping Point
    In today’s world, every business is a data business. Without the insights from data, companies risk being left behind by their competitors. Technology continues to advance at an ever-increasing pace, and this generates more and more data. This incr…
    - 3 days ago 20 Nov 17, 6:21pm -
  • Building a global data lake for International Banking
    Financial institutions need to leverage all the information they can gather to guide future investments, reduce risk and detect fraud. These objectives directly influence an institution’s bottom line and have become more challenging with the rising…
    - 6 days ago 17 Nov 17, 7:34pm -
  • How Nissan is Harnessing Big Data to Provide Value to Customers
    We’ve just published our most recent case study! This one gives an in-depth look at how Nissan Motor Company Ltd (Nissan), is now able to store huge volumes of data and deploy a variety of data cross-functionally. Nissan is a Japanese multinational…
    - 10 days ago 13 Nov 17, 5:00pm -


  • Save the Date: PyImageConf 2018
    Imagine taking the practical, hands-on teaching style of the PyImageSearch blog… …and translating it to a live, in person conference. Sound interesting? If so, mark the date on your calendar now: PyImageConf is taking place on August 26th-28th 2…
    - 3 days ago 20 Nov 17, 3:00pm -
  • How to install mxnet for deep learning
    When it comes to deep learning, Keras is my favorite Python library… …but a close runner up is mxnet. What I like about mxnet is that it combines the best of both worlds in terms of performance and ease of use. Inside mxnet you’ll find: Caffe-l…
    - 10 days ago 13 Nov 17, 3:00pm -
  • Deep learning: How OpenCV’s blobFromImage works
    Today’s blog post is inspired by a number of PyImageSearch readers who have commented on previous deep learning tutorials wanting to understand what exactly OpenCV’s [crayon-5a12ee2141019937315623-i/]  function is doing under the hood. You see,…
    - 17 days ago 6 Nov 17, 3:00pm -
  • How-To: Multi-GPU training with Keras, Python, and deep learning
    Keras is undoubtedly my favorite deep learning + Python framework, especially for image classification. I use Keras in production applications, in my personal deep learning projects, and here on the PyImageSearch blog. I’ve even based over two-thir…
    - 24 days ago 30 Oct 17, 2:00pm -
  • Raspberry Pi: Facial landmarks + drowsiness detection with OpenCV and dlib
    Today’s blog post is the long-awaited tutorial on real-time drowsiness detection on the Raspberry Pi! Back in May I wrote a (laptop-based) drowsiness detector that can be used to detect if the driver of a motor vehicle was getting tired and potenti…
    - 31 days ago 23 Oct 17, 2:00pm -

Walking Randomly

  • The Sheffield Research Software Engineering blog
    Taps microphone: ‘Is this still on?’ I’ve been blogging on here for over 10 years and this article marks the end of the largest gap in posting that I’ve ever done — almost 6 months!  A couple of people have asked me if I’ve given up on W…
    - 6 days ago 17 Nov 17, 2:33pm -
  • HPC-centric Research Software Engineering role within RSE Sheffield
    A job opportunity within the RSE Sheffield group is available under the job title of “Research Software Engineer in High Performance Computing (HPC) enabled Multi-Scale Modelling”. This is a EU funded position with a focus on supporting the biome…
    - 24 May 17, 6:43am -
  • Faster transpose matrix multiplication in R
    I’m working on optimising some R code written by a researcher at University of Sheffield and its very much a war of attrition! There’s no easily optimisable hotspot and there’s no obvious way to leverage parallelism. Progress is being made by s…
    - 23 May 17, 9:42am -
  • How powerful are Microsoft Azure’s free Jupyter notebooks?
    For a while now, Microsoft have provided a free Jupyter Notebook service on Microsoft Azure. At the moment they provide compute kernels for Python, R and F# providing up to 4Gb of memory per session. Anyone with a Microsoft account can upload their o…
    - 15 May 17, 7:05am -
  • Research Software Engineering: State of the Nation 2017
    I am a co-investigator on an EPSRC-funded grant called the RSE-N (Research Software Engineering Network), the aim of which is to co-ordinate various Research Software Engineering activities nationally.  One of the outputs of this work is a ‘State…
    - 10 Apr 17, 3:40pm -