Cover art for podcast The History of Computing

The History of Computing

199 EpisodesProduced by Charles EdgeWebsite

Computers touch all most every aspect of our lives today. We take the way they work for granted and the unsung heroes who built the technology, protocols, philosophies, and circuit boards, patched them all together - and sometimes willed amazingness out of nothing. Not in this podcast. Welcome to th… read more

10:50

The R Programming Language

R is the 18th level of the Latin alphabet. It represents the rhotic consonant, or the r sound. It goes back to the Greek Rho, the Phoenician Resh before that and the Egyptian rêš, which is the same name the Egyptians had for head, before that. R appears in about 7 and a half percent of the words in the English dictionary. 

And R is probably the best language out there for programming around various statistical and machine learning tasks. We may use tools like Tensorflow imported to languages like python to prototype but R is incredibly performant for all the maths. And so it has become an essential piece of software for data scientists. 

The R programming language was created in 1993 by two statisticians Robert Gentleman, and Ross Ihaka at the University of Auckland, New Zealand. It has since been ported to practically every operating system and is available at r-project.org. Initially called "S," the name changed to "R" to avoid a trademark issue with a commercial software package that we’ll discuss in a bit. R was primarily written in C but used Fortran and since even R itself. 

And there have been statistical packages since the very first computers were used for math. 

IBM in fact packaged up BMDP when they first started working on the idea at UCLA Health Computing Facility. That was 1957. Then came SPSS out of the University of Chicago in 1968. And the same year, John Sall and others gave us SAS, or Statistical Analysis System) out of North Carolina State University. And those evolved from those early days through into the 80s with the advent of object oriented everything and thus got not only windowing interfaces but also extensibility, code sharing, and as we moved into the 90s, acquisition’s. BMDP was acquired by SPSS who was then acquired by IBM and the products were getting more expensive but not getting a ton of key updates for the same scientific and medical communities.

And so we saw the upstarts in the 80s, Data Desk and JMP and others. Tools built for windowing operating systems and in object oriented languages. We got the ability to interactively manipulate data, zoom in and spin three dimensional representations of data, and all kinds of pretty aspects. But they were not a programmers tool.

S was begun in the seventies at Bell Labs and was supposed to be a statistical MATLAB, a language specifically designed for number crunching. And the statistical techniques were far beyond where SPSS and SAS had stopped. And with the breakup of Ma Bell, parts of Bell became Lucent, which sold S to Insightful Corporation who released S-PLUS and would later get bought by TIBCO. Keep in mind, Bell was testing line quality and statistics and going back to World War II employed some of the top scientists in those fields, ones who would later create large chunks of the quality movement and implementations like Six Sigma. Once S went to a standalone software company basically, it became less about the statistics and more about porting to different computers to make more money. 

Private equity and portfolio conglomerates are, by nature, after improving the multiples on a line of business. But sometimes more statisticians in various feels might feel left behind. And this is where R comes into the picture. R gained popularity among statisticians because it made it easier to write complicated statistical algorithms without learning an entire programming language. Its popularity has grown significantly since then. R has been described as a cross between MATLAB and SPSS, but much faster.


R was initially designed to be a language that could handle statistical analysis and other types of data mining, an offshoot of which we now call machine learning. R is also an open-source language and as with a number of other languages has plenty of packages available through a package repository - which they call CRAN (Comprehensive R Archive Network). This allows R to be used in fields outside of statistics and data science or to just get new methods to do math that doesn’t belong in the main language. 

There are over 18,000 packages for R. One of the more popular is ggplot2, an open-source data visualization package. data.table is another that performs programmatic data manipulation operations. dplyr provides functions designed to enable data frame manipulation in an intuitive manner. tidyr helps create tidier data. Shiny generates interactive web apps. And there are plenty of packages to make R easier, faster, and more extensible.

By 2015, more than 10 million people used R every month and it’s now the 13th most popular language in use. And the needs have expanded. We can drop r scripts into other programs and tools for processing. And some of the workloads are huge. This led to the development of parallel computing, specifically using MPI (Message Passing Interface). 

R programming is one of the most popular languages used for statistical analysis, statistical graphics generation, and data science projects. There are other languages or tools for specific uses but it’s even started being used in those. 

The latest version, R 4.1.2, was released on 21/11/01. R development, as with most thriving open source solutions, is guided by a group of core developers supported by contributions from the broader community. It became popular because it provides all essential features for data mining and graphics needed for academic research and industry applications and because of the pluggable and robust and versatile nature.

And projects like tensorflow and numpy and sci-kit have evolved for other languages. And there are services from companies like Amazon that can host and process assets from both, both using unstructured databases like NoSQL or using Jupyter notebooks.

A Jupyter Notebook is a JSON document, following a versioned schema that contains an ordered list of input/output cells which can contain code, text (using Markdown), formulas, algorithms, plots and even media like audio or video. Project Jupyter was a spin-off of iPython but the goal was to create a language-agnostic tool where we could execute aspects in Ruby or Haskel or Python or even R. This gives us so many ways to get our data into the notebook, in batches or deep learning environments or whatever pipeline needs to be built based on an organization’s stack. Especially if the notebook has a frontend based on Amazon SageMaker Notebooks, Google's Colaboratory and Microsoft's Azure Notebook.

Think about this. 25% of the languages lack a rhotic consonant. Sometimes it seems like we’ve got languages that do everything or that we’ve built products that do everything. But I bet no matter the industry or focus or sub-specialty, there’s still 25% more automation or instigation into our own data to be done. Because there always will be.

Educational emoji reaction

Educational

Interesting emoji reaction

Interesting

Funny emoji reaction

Funny

Agree emoji reaction

Agree

Love emoji reaction

Love

Wow emoji reaction

Wow

Are you the creator of this podcast?

Verify your account

and pick the featured episodes for your show.

Listen to The History of Computing

RadioPublic

A free podcast app for iPhone and Android

  • User-created playlists and collections
  • Download episodes while on WiFi to listen without using mobile data
  • Stream podcast episodes without waiting for a download
  • Queue episodes to create a personal continuous playlist
RadioPublic on iOS and Android
Or by RSS
RSS feed
https://thehistoryofcomputing.net/radiopublic

Connect with listeners

Podcasters use the RadioPublic listener relationship platform to build lasting connections with fans

Yes, let's begin connecting
Browser window

Find new listeners

  • A dedicated website for your podcast
  • Web embed players designed to convert visitors to listeners in the RadioPublic apps for iPhone and Android
Clicking mouse cursor

Understand your audience

  • Capture listener activity with affinity scores
  • Measure your promotional campaigns and integrate with Google and Facebook analytics
Graph of increasing value

Engage your fanbase

  • Deliver timely Calls To Action, including email acquistion for your mailing list
  • Share exactly the right moment in an episode via text, email, and social media
Icon of cellphone with money

Make money

  • Tip and transfer funds directly to podcastsers
  • Earn money for qualified plays in the RadioPublic apps with Paid Listens