• !How to make a line chart with ggplot2
    Making a line chart in ggplot2 is pretty straightforward, if you know how it works ...The post How to make a line chart with ggplot2 appeared first on SHARP SIGHT LABS.
    - 8 hours ago 19 Sep 17, 7:00am -
  • !Accessing patent data with the patentsview package
    Why care about patents?1. Patents play a critical role in incentivizing innovation, withoutwhich we wouldn't have much of the technology we rely on everydayWhat does your iPhone, Google's PageRank algorithm, and a buttersubstitute called Smart…
    - 8 hours ago 19 Sep 17, 7:00am -
  • !Talk like a pirate day 2017
    International Talk Like A Pirate Day Q: What has 8 legs, 8 arms and 8 eyes?A: 8 pirates.Avast, ye scurvy scum! Today be September 19, the International Talk Like A Pirate Day. While it’s a silly “holiday”, it’s a great chance to ...
    - 10 hours ago 19 Sep 17, 5:00am -
  • !Automating roxygen2 package documentation
    Take the mystery out of CRAN level package maintenance - Thinking of creating a new package? Dread the task of function documentation? Afraid to run devtools::check(build_args = '--as-cran') and get bombarded by Errors, Warnings, and Notes (o...
    - 15 hours ago 19 Sep 17, 12:00am -
  • !Recap: Applications of R at EARL London 2017
    The fourth EARL London conference took place last week, and once again it was an enjoyable and informative showcase of practical applications of R. Kudos to the team from Mango for hosting a great event featuring interesting talks and a friendly crow…
    - 16 hours ago 18 Sep 17, 11:17pm -

High Scalability


  • How to Generate FiveThirtyEight Graphs in Python
    If you read data science articles, you may have already stumbled upon FiveThirtyEight’s content. Naturally, you were impressed by their awesome visualizations. You wanted to make your own awesome visualizations and so asked Quora and Reddit how to…
    - 12 days ago 7 Sep 17, 7:00am -
  • Machine Learning Fundamentals: Predicting Airbnb Prices
    Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-bu…
    - 19 days ago 31 Aug 17, 10:00am -
  • What's New in v1.29: New Mission Interface, PayPal and more!
    Our version 1.29 release is here and includes lots of new features to help enhance your learning experience. Over the past few months we’ve been tirelessly talking to students like you to learn how we can improve the mission interface. With…
    - 20 days ago 30 Aug 17, 3:00pm -
  • Python Cheat Sheet for Data Science: Intermediate
    The printable version of this cheat sheet The tough thing about learning data is remembering all the syntax. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, s…
    - 21 days ago 29 Aug 17, 12:00pm -
  • How to get your first job as a data scientist.
    Many aspiring data scientists focus on doing Kaggle competitions as a way to build their portfolios. Kaggle is an excellent way to practice, but it should only be one of many avenues you use to work on data science projects. This is because Kag…
    - 35 days ago 15 Aug 17, 8:00am -


  • !Hurricane Irma's rains, visualized with R
    The USGS has followed up their visualization of Hurricane Harvey rainfalls with an updated version of the animation, this time showing the rain and flooding from Hurricane Irma in Florida: Another #rstats #dataviz! Precip and #flooding from #Hurrican…
    - 1 hour ago 19 Sep 17, 1:51pm -
  • !Recap: Applications of R at EARL London 2017
    The fourth EARL London conference took place last week, and once again it was an enjoyable and informative showcase of practical applications of R. Kudos to the team from Mango for hosting a great event featuring interesting talks and a friendly crow…
    - 16 hours ago 18 Sep 17, 11:17pm -
  • Because it's Friday: Rapid Unscheduled Disassembly
    SpaceX has done some amazing work proving the concept of commercial spaceflight services. But that's not to say there haven't been a few bumps along the way, as this "blooper reel" (set to Monty Python music shows). (If now's not a good time for vide…
    - 4 days ago 15 Sep 17, 7:30pm -
  • Microsoft R Open 3.4.1 now available
    Microsoft R Open (MRO), Microsoft's enhanced distribution of open source R, has been upgraded to version 3.4.1 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to R 3.4.1 and updates the bundle…
    - 4 days ago 15 Sep 17, 10:29am -
  • Working with data frames in SQL Server R Services
    Most R users are quite familiar with data frames: the data.frame is the fundamental object type for working with columnar data in R. But for SQL Server users, the data frame is an important concept to understand, since it will be the main object type…
    - 5 days ago 14 Sep 17, 4:56pm -

Data Analytics and R

  • !Whats new on arXiv
    Uncertainty relations with quantum memory for the Wehrl entropy We prove two new fundamental uncertainty relations with quantum memory for …Continue reading →
    - 9 hours ago 19 Sep 17, 6:07am -
  • !Book Memo: “Visualize This”
    The FlowingData Guide to Design, Visualization, and Statistics Data doesn’t decrease; it is ever-increasing and can be overwhelming to organize …Continue reading →
    - 11 hours ago 19 Sep 17, 4:05am -
  • !If you did not already know
    Deep Mutual Learning (DML) Model distillation is an effective and widely used technique to transfer knowledge from a teacher to …Continue reading →
    - 13 hours ago 19 Sep 17, 2:03am -
  • !Magister Dixit
    “Big Data and traditional data warehousing systems, however, have the similar goals to deliver business value through the analysis of …Continue reading →
    - 15 hours ago 19 Sep 17, 12:01am -
  • !Document worth reading: “Neural networks and rational functions”
    Neural networks and rational functions efficiently approximate each other. In more detail, it is shown here that for any ReLU …Continue reading →
    - 17 hours ago 18 Sep 17, 10:23pm -

Google Cloud

The HortonWorks Blog

  • !Lloyds Banking Group Brings Home Data Accolade
    Lloyds Banking Group has been awarded this year’s data accolade for its Digital Analytics System Delivery (DASD) project. The goal was to build a data environment that could be used to derive predictive models for fraud detection. Lloyds Banking Gr…
    - 18 hours ago 18 Sep 17, 8:55pm -
  • Engineering @ Hortonworks – The Matrix
    This is the introductory post in a blog series that explores how we in Hortonworks Engineering build, test and release new versions of our platforms. In this post, we introduce the basic themes and set context for deeper discussions in subsequent blo…
    - 1 day ago 18 Sep 17, 3:00pm -
  • Benchmark Apache HBase vs Apache Cassandra on SSD in a Cloud Environment
    Overview As more and more workloads are being brought onto modern hardware in the cloud, it’s important for us to understand how to pick the best databases that can leverage the best hardware. Amazon has introduced instances with directly attached…
    - 5 days ago 14 Sep 17, 4:05pm -
  • Six Questions About Big Data Cyber Risk Answered
    Guest authored by Ross Porter, Director Presales Systems Engineering EMEA, DellEMC One of the hottest topics for both DellEMC and Hortonworks today is how to protect big data repositories, data lakes, from the emerging breed of cyber-attacks. We sat…
    - 5 days ago 14 Sep 17, 1:00pm -
  • Bondi Beach Isn’t the Only Reason to Go to Sydney Next Week: 3 DataWorks Summit/Hadoop Summit Keynotes to Attend
    At the DataWorks Summit/Hadoop Summit, these three keynote speeches on the future of analytics and its use in business are absolutely must-see.The post Bondi Beach Isn’t the Only Reason to Go to Sydney Next Week: 3 DataWorks Summit/Hadoop Summit K…
    - 5 days ago 14 Sep 17, 3:39am -


  • Real-time object detection with deep learning and OpenCV
    Today’s blog post was inspired by PyImageSearch reader, Emmanuel. Emmanuel emailed me after last week’s tutorial on object detection with deep learning + OpenCV and asked: “Hi Adrian, I really enjoyed last week’s blog post on object detection…
    - 1 day ago 18 Sep 17, 2:00pm -
  • Object detection with deep learning and OpenCV
    A couple weeks ago we learned how to classify images using deep learning and OpenCV 3.3’s deep neural network ([crayon-59c0ad6062e3b320101351-i/] ) module. While this original blog post demonstrated how we can categorize an image into one of Imag…
    - 8 days ago 11 Sep 17, 2:00pm -
  • Raspbian Stretch: Install OpenCV 3 + Python on your Raspberry Pi
    It’s been over two years since the release of Raspbian Jessie. As of August 17th, 2017, the Raspberry Pi foundation has officially released the successor to Raspbian Jessie — Raspbian Stretch. Just as I have done in previous blog posts, I’ll b…
    - 15 days ago 4 Sep 17, 2:00pm -
  • Fast, optimized ‘for’ pixel loops with OpenCV and Python
    Have you ever had to loop over an image pixel-by-pixel using Python and OpenCV? If so, you know that it’s a painfully slow operation even though images are internally represented by NumPy arrays. So why is this? Why are individual pixel accesses in…
    - 22 days ago 28 Aug 17, 2:00pm -
  • Deep Learning with OpenCV
    Two weeks ago OpenCV 3.3 was officially released, bringing with it a highly improved deep learning ([crayon-59c0ad6070ee6808769576-i/] ) module. This module now supports a number of deep learning frameworks, including Caffe, TensorFlow, and Torch/Py…
    - 29 days ago 21 Aug 17, 2:00pm -

Walking Randomly

  • HPC-centric Research Software Engineering role within RSE Sheffield
    A job opportunity within the RSE Sheffield group is available under the job title of “Research Software Engineer in High Performance Computing (HPC) enabled Multi-Scale Modelling”. This is a EU funded position with a focus on supporting the biome…
    - 24 May 17, 6:43am -
  • Faster transpose matrix multiplication in R
    I’m working on optimising some R code written by a researcher at University of Sheffield and its very much a war of attrition! There’s no easily optimisable hotspot and there’s no obvious way to leverage parallelism. Progress is being made by s…
    - 23 May 17, 9:42am -
  • How powerful are Microsoft Azure’s free Jupyter notebooks?
    For a while now, Microsoft have provided a free Jupyter Notebook service on Microsoft Azure. At the moment they provide compute kernels for Python, R and F# providing up to 4Gb of memory per session. Anyone with a Microsoft account can upload their o…
    - 15 May 17, 7:05am -
  • Research Software Engineering: State of the Nation 2017
    I am a co-investigator on an EPSRC-funded grant called the RSE-N (Research Software Engineering Network), the aim of which is to co-ordinate various Research Software Engineering activities nationally.  One of the outputs of this work is a ‘State…
    - 10 Apr 17, 3:40pm -
  • High Performance Computing – There’s plenty of room at the bottom
    UK to launch 6 major HPC centres Tomorrow, I’ll be attending the launch event for the UK’s new HPC centres and have been asked to deliver a short talk as part of the program. As someone who paddles in the shallow-end of the HPC pool I find this b…
    - 29 Mar 17, 8:13pm -