• !10 Jobs for R users from around the world (2018-07-17)
    To post your R job on the next post Just visit  this link and post a new R job  to the R community. You can post a job for  free  (and there are also “featured job” options available for extra exposure). Current R jobs Job seekers:  ple…
    - 35 mins ago 17 Jul 18, 8:32am -
  • !Hamiltonian tails
    “We demonstrate HMC’s sensitivity to these parameters by sampling from a bivariate Gaussian with correlation coefficient 0.99. We consider three settings (ε,L) = {(0.16; 40); (0.16; 50); (0.15; 50)}” Ziyu Wang, Shakir Mohamed, and Nando De Fre…
    - 11 hours ago 16 Jul 18, 10:18pm -
  • !Continuous deployment of package documentation with pkgdown and Travis CI
    The problempkgdown is an R package that can create a beautifully looking website for your own R package. Built and maintained by Hadley Wickham and his gang of prolific contributors, this package can parse the documentation files and vignettes for y…
    - 16 hours ago 16 Jul 18, 5:32pm -
  • !New Course Content: DS4B 201 Chapter 7, The Expected Value Framework For Modeling Churn With H2O
    I’m pleased to announce that we released brand new content for our flagship course, Data Science For Business (DS4B 201). The latest content is focused on transitioning from modeling Employee Churn with H2O and LIME to evaluating our binary classif…
    - 17 hours ago 16 Jul 18, 3:45pm -
  • !Twitter coverage of the useR! 2018 conference
    In summary: useR! the conference for users of R was held in Brisbane earlier this month it sounded like a lot of fun and here’s an analysis of tweets that used the #useR2018 hashtag during the week The code that generated the report (which I’ve u…
    - 21 hours ago 16 Jul 18, 12:09pm -

High Scalability

  • Stuff The Internet Says On Scalability For July 13th, 2018
    Hey, it's HighScalability time: Steve Blank tells the Secret History of Silicon Valley. What a long, strange trip it is. Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you kno…
    - 4 days ago 13 Jul 18, 3:56pm -
  • Sponsored Post: Datadog, InMemory.Net, Triplebyte, Etleap, Scalyr, MemSQL
    Who's Hiring? Twitch's commerce team in San Francisco is looking to hire senior developers to keep up with rapidly increasing demand for our Subscriptions and Payment platform. Engineers will be tasked with building new products and features to so…
    - 7 days ago 10 Jul 18, 3:56pm -
  • Stuff The Internet Says On Scalability For July 6th, 2018
    Hey, it's HighScalability time: Could RAINB (Redundant Array of Independent Neanderthal ‘minibrains’) replace TPUs as the future AI core?  Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great de…
    - 11 days ago 6 Jul 18, 3:56pm -
  • In Defense of Humanity—How Complex Systems Failed in Westworld **spoilers**
     The Westworld season finale made an interesting claim: humans are so simple and predictable they can be encoded by a 10,247-line algorithm. Small enough to fit in the pages of a thin virtual book.Perhaps my brain was already driven into a meta…
    - 15 days ago 2 Jul 18, 4:39pm -
  • Stuff The Internet Says On Scalability For June 29th, 2018
    Hey, it's HighScalability time: Rockets. They're big. You won't believe how really really big they are. (Corridor Crew) Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you kno…
    - 18 days ago 29 Jun 18, 4:04pm -


  • Preparing for the Data Science Job Hunt
    Job hunting is stressful, especially if you’re moving into an entirely new field. In this post, we give tips on finding data science jobs, looking up salaries, and what to do before you apply.
    - 6 days ago 11 Jul 18, 12:00pm -
  • Data Science Project Style Guide
    We've been building guided projects for three years — here is what we've learned, and tips that will help you in your job hunt.
    - 8 days ago 9 Jul 18, 12:00pm -
  • Basic Statistics in Python: Descriptive Statistics
    Statistics, done correctly, allows us to extract knowledge from the vague, complex, and difficult real world. In this post, we explore descriptive statistics.
    - 14 days ago 3 Jul 18, 12:00pm -
  • DIY AI for the Future
    AI is set to disrupt our current society on a major scale. Check out these new DIY projects you can try at home.
    - 20 days ago 27 Jun 18, 12:00pm -
  • Top 12 Essential Command Line Tools for Data Scientists
    Increase your productivity with this overview of a dozen Unix-like operating system command line tools.
    - 27 days ago 20 Jun 18, 12:00pm -


  • Because it's Friday: Language and Thought
    Does the language we speak change the way we think? This TED talk by Lera Boroditsky looks at how language structures like gendered nouns, or the way directions are described, might shape they way speakers of those languages think about things: This…
    - 3 days ago 13 Jul 18, 9:36pm -
  • New open data sets from Microsoft Research
    Microsoft has released a number of data sets produced by Microsoft Research and made them available for download at Microsoft Research Open Data. The Datasets in Microsoft Research Open Data are categorized by their primary research area, such as Phy…
    - 4 days ago 12 Jul 18, 10:33pm -
  • In case you missed it: June 2018 roundup
    In case you missed them, here are some articles from June of particular interest to R users. An animated visualization of global migration, created in R by Guy Abel. My take on the question, Should you learn R or Python for data science? The BBC and…
    - 6 days ago 10 Jul 18, 11:20pm -
  • R 3.5.1 update now available
    Last week the R Core Team released the latest update to the R statistical data analysis environment, R version 3.5.1. This update (codenamed "Feather Spray" — a Peanuts reference) makes no user-visible changes and fixes a few bugs. It is backwards-…
    - 7 days ago 9 Jul 18, 11:35pm -
  • Because it's Friday: Wavy Lines Illusion
    Another fine illusion: in this one, the pairs of horizontal lines are all smooth sine curves, despite the appearance of the jagged zig-zags: It's really hard for me at least to tell that the zig-zags in the light grey region are actually curved. Zoom…
    - 17 days ago 29 Jun 18, 10:11pm -

Data Analytics and R

  • !Document worth reading: “An Introduction to Image Synthesis with Generative Adversarial Nets”
    There has been a drastic growth of research in Generative Adversarial Nets (GANs) in the past few years. Proposed in …Continue reading →
    - 3 hours ago 17 Jul 18, 6:07am -
  • !Distilled News
    An Introductory Guide to Maximum Likelihood Estimation (with a case study in R) Interpreting how a model works is one …Continue reading →
    - 5 hours ago 17 Jul 18, 4:05am -
  • !R Packages worth a look
    Bus and Transit Time Calculations (bustt)Calculate and work with time and schedules for bus, train, etc on transit data. Answer …Continue reading →
    - 7 hours ago 17 Jul 18, 2:03am -
  • !Book Memo: “Machine Learning on Data Lake”
    Accessing and cataloging data offers the ability to use and connect into new analytical techniques and services, such as predictive …Continue reading →
    - 9 hours ago 17 Jul 18, 12:01am -
  • !If you did not already know
    Integer Echo State Network (intESN) We propose an integer approximation of Echo State Networks (ESN) based on the mathematics of …Continue reading →
    - 11 hours ago 16 Jul 18, 10:23pm -

Google Cloud

  • Using instance metadata in Cloud Dataproc initialization actions
    By Julien Phalip, Solutions ArchitectInstance metadata is a powerful feature in Google Cloud Platform’s Compute Engine. Each Compute Engine instance comes with several metadata values that are set by default to provide useful information like the…
    - 4 days ago 13 Jul 18, 9:00am -
  • 6 must-see sessions on the Internet of Things (IoT) at Next ‘18
    By ​Arun Ananthampalayam, Product Marketing Manager, Cloud IoTThe Internet of Things (IoT) brings the impact of cloud computing to a variety of remote devices. This year we’re offering 18 IoT-oriented sessions on topics ranging from building a…
    - 6 days ago 11 Jul 18, 12:00am -
  • 7 must-see sessions on data analytics at Next ‘18
    By Saptarshi Mukherjee, Product Marketing Lead, Data analytics & IoTFrom understanding Wikipedia pageview data to determining patent coverage from both private and public patent datasets, big data has proved essential to solving numerous interestin…
    - 6 days ago 11 Jul 18, 12:00am -
  • How to train a ResNet image classifier from scratch on TPUs on Cloud ML Engine
    By Lak Lakshmanan, Technical Lead, Machine Learning and Big Data Professional ServicesTensor Processing Units (TPUs) are hardware accelerators that greatly speed up the training of deep learning models. In independent tests conducted by Stanford U…
    - 7 days ago 10 Jul 18, 9:00am -
  • Measuring patent claim breadth using Google Patents Public Datasets
    By Otto Stegmaier, Data Scientist, Global PatentsLast fall, we released the Google Patents Public Datasets on BigQuery. These datasets include a collection of publicly accessible, connected database tables that enable empirical analysis of the int…
    - 7 days ago 10 Jul 18, 6:00am -

The HortonWorks Blog

  • !This Big Data Business Strategy Is Your Formula for Success
    Your big data business strategy is just the starting point if you expect your business to achieve long-term success.The post This Big Data Business Strategy Is Your Formula for Success appeared first on Hortonworks.
    - 16 hours ago 16 Jul 18, 5:00pm -
  • Announcing the General Availability of Hortonworks Data Platform 3.0.0, Ambari 2.7.0 and SmartSense 1.5.0
    We’re excited to announce the long-awaited release of Hortonworks Data Platform 3.0.0. HDP 3.0 is faster, smarter, hybrid, bigger, trusted and real-time database. We encourage you to read our blog that announced HDP 3.0. You can also view our keyno…
    - 4 days ago 13 Jul 18, 7:01pm -
  • A Step-by-Step Guide for HDFS Replication
    This blog focuses on on-prem to on-prem HDFS replication for HDP clusters using Hortonworks Data Lifecycle Manager (DLM),  an extensible  service built on the Hortonworks DataPlane Platform. Introduction to DLM Data Lifecycle Manager (DLM) delivers…
    - 5 days ago 12 Jul 18, 3:53pm -
  • How Real-Time Data Is Affecting Healthcare
    Medical devices are becoming increasingly connected and are able to relay real-time data to analytics systems that can produce actionable information where it counts.The post How Real-Time Data Is Affecting Healthcare appeared first on Hortonworks.
    - 5 days ago 12 Jul 18, 2:00pm -
  • Three Things CEOs Should Know About the Use of Artificial Intelligence in Decision-Making
    Until CEOs understand the use of artificial intelligence in decision-making, enterprises will not be ready to jump into machine learning.The post Three Things CEOs Should Know About the Use of Artificial Intelligence in Decision-Making appeared firs…
    - 6 days ago 11 Jul 18, 2:00pm -


  • !OpenCV Saliency Detection
    Today’s tutorial is on saliency detection, the process of applying image processing and computer vision algorithms to automatically locate the most “salient” regions of an image. In essence, saliency is what “stands out” in a photo or scene…
    - 19 hours ago 16 Jul 18, 2:00pm -
  • An interview with Adam Geitgey, creator of the face_recognition Python library
    You may have noticed that over the past couple of weeks we have been using a special Python package called [crayon-5b4ca50b28b35881861793-i/]  quite a bit on the PyImageSearch blog: We first used it to build a face recognition system We then applied…
    - 6 days ago 11 Jul 18, 2:00pm -
  • Face clustering with Python
    Today’s blog post is inspired by a question from PyImageSearch reader, Leonard Bogdonoff. After I published my previous post on Face recognition with OpenCV and deep learning, Leonard wrote in and asked: Hey Adrian, can you go into identity cluster…
    - 8 days ago 9 Jul 18, 2:00pm -
  • An interview with Francois Chollet
    In today’s blog post, I interview arguably one of the most important researchers and practitioners in modern day deep learning, Francois Chollet. Francois is not only the creator of the Keras deep learning library, but he’s also a Google AI res…
    - 15 days ago 2 Jul 18, 2:00pm -
  • Raspberry Pi Face Recognition
    In last week’s blog post you learned how to perform Face recognition with Python, OpenCV, and deep learning. But as I hinted at in the post, in order to perform face recognition on the Raspberry Pi you first need to consider a few optimizations —…
    - 22 days ago 25 Jun 18, 2:00pm -

Walking Randomly