    As I've been blogging more about statistics, R, and research in general, I've been trying to increase my online presence, sharing my blog posts in groups of like-minded people. Those efforts seem to have paid off, based on my view counts over the pas…
  A guide to working with character data in R
    R is primarily a language for working with numbers, but we often need to work with text as well. Whether it's formatting text for reports, or analyzing natural language data, R provides a number of facilities for working with character data. Handling…
  • Using DataCamp’s Autograder to Teach R
    Immediate and personalized feedback has been central to the learning experience on DataCamp since we launched the first courses. If students submit code that contains a mistake, they are told where they made a mistake, and how they can fix this. You…
  • Melt and cast the shape of your data.frame – Exercises
      Datasets often arrive to us in a form that is different from what we need for our modelling or visualisations functions who in turn don't necessary require the same format. Reshaping data.frames is a step that all analysts need but many struggle…
  • Creating Slopegraphs with R
    Presenting data results in the most informative and compelling manner is part of the role of the data scientist. It's all well and good to master the arcana of some algorithm, to manipulate and master the numbers and bend them to your will to produce…
High Scalability

  • Stuff The Internet Says On Scalability For June 22nd, 2018
    Hey, it's HighScalability time: 4th of July may never be the same. China creates stunning non-polluting drone swarm firework displays. Each drone is rated with a game mechanic and gets special privileges based on performance (just kidding). (T…
  • Sponsored Post: Datadog, InMemory.Net, Triplebyte, Etleap, Scalyr, MemSQL
    Who's Hiring? Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here. Need excellent people? Advertise yo…
  • How Ably Efficiently Implemented Consistent Hashing
    This is a guest post by Srushtika Neelakantam, Developer Advovate for Ably Realtime, a realtime data delivery platform. You can view the original article—How to implement consistent hashing efficiently—on Ably's blog.Ably's realtime platform…
  • Stuff The Internet Says On Scalability For June 15th, 2018
    Hey, it's HighScalability time: Scaling fake ratings. A 5 star 10,000 phone Chinese click farm. (English Russia)Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone lo…
  • Open Source Database HA Resources from Severalnines
     Severalnines has spent the last several years writing blogs and crafting content to help make your open source database solutions highly available. We are fans of highscalability.com and wanted to post some links to our top resources to help read…
  Because it's Friday: The lioness sleeps tonight
    Handlers for the lion enclosure at San Diego Zoo have developed a novel way to provide stimulation for their big cats: let them play tug-of-war with people outside. People plural that is — it turns out that a young lioness is no match for a trio of…
  • AI, Machine Learning and Data Science Roundup: June 2018
    A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications I've noted over the past month or so. Open Source AI, M…
  • PYPL Language Rankings: Python ranks #1, R at #7 in popularity
    The new PYPL Popularity of Programming Languages (June 2018) index ranks Python at #1 and R at #7. Like the similar TIOBE language index, the PYPL index uses Google search activity to rank language popularity. PYPL, however, fcouses on people searchi…
  • Because it's Friday: Olive Garden Bot
    Comedy writer Keaton Patti claims this commercial script for a US Italian restaurant chain was generated by a bot: I forced a bot to watch over 1,000 hours of Olive Garden commercials and then asked it to write an Olive Garden commercial of its own.…
Data Analytics and R

  Document worth reading: "Seeing the forest for the trees An investigation of network knowledge"
    This paper assesses the empirical content of one of the most prevalent assumptions in the economics of networks literature, namely …
  R Packages worth a look
    Techniques for Automated Classifiers (ncodeR)A set of techniques that can be used to develop, validate, and implement automated classifiers. A …
  Magister Dixit
    "There are four basic presentation types for charts: 1. Comparison 2. Composition 3. Distribution 4. Relationship" Dikesh Jariwala ( 29.12.2016 …
  If you did not already know
    TANKER Named Entity Recognition and Disambiguation (NERD) systems have recently been widely researched to deal with the significant growth of …
  Book Memo: "Mixed Intelligent Systems"
    Developing Models for Project Management and Evaluation Correctly functioning evaluation systems directly influence the efficient and effective planning and implementation …
Google Cloud

The HortonWorks Blog

  • Announcing Cloudbreak 2.7 GA
    We are excited to announce the release of Cloudbreak 2.7! This release includes many new enhancements that further extend the ability of the Enterprise to harness the agility of the cloud for big data workloads. Here is just a sampling of the new fea…
  • Introducing the 2018 Data Hero Nominees and Winners – Americas!
    Early last year we announced the Hortonworks Data Heroes initiative. It's our way of recognizing the Data Visionaries, Data Scientists, Data Architects, HCC Community Champion, and Cognitive Honors awards to organizations transforming their bus…
  • Teaming on Data: IBM and Hortonworks Broaden Relationship
    By Rob Thomas, General Manager, IBM Analytics. This story first appeared on the IBM Big Data & Analytics Hub. Data is driving business. And as volumes climb with no end in sight, companies have a decision to make: harness and extract insight from th…
  • Announcing HDP 3.0 – Faster, Smarter, Hybrid Data
    We are thrilled to announce that Hortonworks Data Platform (HDP) version 3.0 is now available for early access. For more information, go here.  HDP 3.0 delivers new capabilities for the enterprise to enable agile application deployment, new machine…
  • Explore the latest of Apache Hadoop YARN at Dataworks Summit San Jose 2018
    This blog post covers some of the sessions from Dataworks Summit San Jose 2018 that focus on the efforts of the Apache Hadoop YARN community. Come & explore the latest and greatest of Apache Hadoop YARN at Dataworks Summit San Jose 2018! Dataworks Su…
  • Face recognition with OpenCV, Python, and deep learning
    In today's blog post you are going to learn how to perform face recognition in both images and video streams using: OpenCV Python Deep learning As we'll see, the deep learning-based facial embeddings we'll be using here today are both (1) highl…
  • How to build a custom face recognition dataset
    In the next couple of blog posts we are going to train a computer vision + deep learning model to perform facial recognition… …but before we can train our model to recognize faces in images and video streams we first need to gather the dataset of…
  • Keras: Multiple outputs and multiple losses
    A couple weeks ago we discussed how to perform multi-label classification using Keras and deep learning. Today we are going to discuss a more advanced technique called multi-output classification. So, what's the difference between the two? And how…
  • Ubuntu 18.04: How to install OpenCV
    In this blog post you will learn how to install OpenCV on Ubuntu 18.04. In the past, I've authored a handful of installation guides for Ubuntu: Ubuntu 16.04: How to install OpenCV with Python 2.7 and Python 3.5+ Install OpenCV 3.0 and Python 2.7+ o…
  • An OpenCV barcode and QR code scanner with ZBar
    Today's blog post on reading barcodes and QR codes with OpenCV is inspired by a question I received from PyImageSearch reader, Hewitt: Hey Adrian, I really love the PyImageSearch blog. I look forward to your emails each week. Keep doing what you'…
Walking Randomly