Discovering Statistics

A den for Learning


A not so Gentle Introduction to R

  • The R system for statistical computing is available to everyone.All scientists, including, in particular, those working in developing countries, now have access to state-of-the-art tools for statistical data analysis without additional costs.

  • With the help of the Rsystem for statistical computing, research really becomes reproducible when both the data and the results of all data analysis steps reported in a paper are available to the readers through an R transcript file.

  • R is most widely used for teaching undergraduate and graduate statistics classes at universities allover the world because students can freely use the statistical computing tools.

  • The base distribution of R is maintained by a small group of statisticians, the R Development CoreTeam.

  • A huge amount of additional functionality is implemented in add-on packages authored andmaintained by a large group of volunteers. The main source of information about the R system is the world wide web with the official home page of the R project being http://CRAN.R-project.org

    • All resources are available from this page: the R system itself, a collection of add-on packages,manuals, documentation and more.

As per the R Language definition (Version 4.3.1 (2023-06-16) DRAFT) : R is a system for statistical computation and graphics. It provides, among other things, a pro- gramming language, high level graphics, interfaces to other languages and debugging facilities.

Why R??

  1. Free, Open Source and Available on every major platform
  2. Massive Set of packages for statistical modelling, machine learning, visualisation and importing and manipulating data
  3. Cutting edge tools
  4. Deep-seated language support for data analysis
  5. A vibrant community
  6. Powerful tools for communicating your results
  7. A strong foundation on functional programming
  8. Existence of IDE (www.rstudio.com/ide/)
  9. Powerful meta programming facilities
  10. Designed to connect to high-performance programming languages like C, Fortran and C++

Cons of R

  1. Codes written in haste
  2. R Community more focused on results rather that processes
  3. Code mostly written to reduce the amount of typing and hence too hard to understand
  4. Inconsistency across contributed packages. (almost 30 years of evolution)
  5. R is a profligate user of memory

Overview of the R-System

  • The R system for statistical computing consists of two major parts: the base system and a collection of user contributed add-on packages.The R language is implemented in the base system. Implementations of statistical and graphical procedures are separated from the base system and are organised in the form of packages.Both the base system and packages are distributed via the Comprehensive R Archive Network(CRAN) accessible under http://CRAN.R-project.orgThe base system is available in source form and in precompiled form for various Unixsystems, Windows plqtforms and Mac OS X. For the data analyst, it is sufficient to download the precompiled binary distribution and install it locally. Windows users can follow the link: http://CRAN.R-project.org/bin/windows/base/release.html

The base distribution already comes with some high-priority add-on packages namely:

mcgvKernSmoothMASSbase
bootclassclustercodetools
datasetsforeigngrDevicesgraphics
gridlatticemethodsnlme
nnetrcompgenrpartspatial
splinesstatsstats4survival
tcltktoolsutils
Packages in the base distribution.

Help and Documentation

Roughly, three different forms of documentation for the R system for statistical computing may be distinguished:

  1. online help that comes with the base distribution or packages, 
  2. electronic manuals and
  3. publications work in the form of books etc

More extensive documentation is available electronically from the collection of manuals at http://CRAN.R-project.org/manuals.html

Some of the electronic manuals available are:

  • An Introduction to R: A more formal introduction to data analysis with R .
  • R Data Import/Export: A very useful description of how to read and write various external data formats.
  • R Installation and Administration: Hints for installing R on special platforms.
  • Writing R Extensions: The authoritative source on how to write R programs and packages.

Arithmetic Operators

OperatorsDescription
+Add
Substract
*Multiply
/Divide
^Exponentiation
%%modulo (remainder)
%/%quotient

Relational Operators

<
>
<=
>=

OTHER Operators

OperatorDescription
:: , :::access variables in a namespace
$ , @component/ slot extraction
[ , [[indexing
:Sequence operator
!negation
& , &&and
| , ||or
~as in formulae
-> , ->>rightwards assignment
<- , <<-assignment( right to left)
=assignment(right to left)
?help

Constants in R

Constant NameDescription
LETTERSthe 26 upper-case letters of the Roman alphabets
lettersthe 26 lower-case letters of the Roman alphabets
month.abbthe three-letter abbreviation for the English month names
month.namethe English names for the months of the year
pithe ratio of the circumference of a circle to its diameter

Numeric Constants in R

NameDescriptiontypeof()
InfInfinitydouble
NaNNot a Numberdouble
NA_real_Not Available/ Missing Realdouble
NA_integerNot Available/ Missing Integerinteger
  • In R numeric constants start with a digit or period and are either a decimal or hexadecimal constant optionally followed by L.Hexadecimal constants start with 0x or 0X followed by a nonemply sequence from 0-9 a-f A-F, which is interpreted as a hexadecimal number optionally followed by a binary exponent. A binary exponent consists of a P or p foowed by an optional plus or minus sign followed by a non-empty sequence of (decimal) digits, and indicates multipication by a power of two.Decimal constants consist of a nonempty sequence of digits possibly containing a period(decimal point), optionally followed by a decimal exponent. A decimal exponent consists of an E or e followed by an optional plus or minus sign followed by a non-empty sequence of digits, and indicates multipication by a power of ten.A constant followed by i is regarded as an imaginary complex number.A numeric constant immediately followed by L is regarded as an integer number when possible.Only the ASCII digits 0-9 are recognized as digits, even in languages which have other other representations of digits. The ‘decimal separator’ is always a period and never a comma.A leading plus or minus is not regarded by the parser as part of a numeric constant by as a unary operator applied to the constant.

FunctionDescription
abs(x)absolute value
sign(x)Sign of elements (1,0,-1)
sqrt(x)square root
floor(x)largest integer
ceiling(x)smallest integer
exp(x)exponential
expm1(x)e(x) – 1
log2(x)log with base 2
log10(x)log with base 10
log1p(x)log(1+x)
cos(x)cosine
sin(x)sine
tan(x)tangent
acos(x)arc cosine
asin(x)arc sine
atan(x)arc tangent
cosh(x)hyperbolic cosine
sinh(x)hyperbolic sine
tanh(x)hyperbolic tangent
acosh(x)arc hyperbolic cosine
asinh(x)arc hyperbolic sine
atanh(x)arc hyperbolic tangent
cospi(x)cos(pi*x)
sinpi(x)cos(pi*x)
tanpi(x)tan(pi*x)
gamma(x)gamma function
lgamma(x)natural log of gamma function
digamma(x)first derivative of gamma
trigamma(x)second derivative of gamma
cumsum(x)cumulative sums
cumprod(x)cumulative products
cummax(x)cumulative maxima
cummin(x)cumulative minima
Im(x)Imaginary part
Re(x)Real part of Complex Number
Arg(x)Argument part of Complex Number
Conj(x)Compex conjugate
Mod(x)Modulus
Mathematical Functions with single arguments

Some Functions from base-package

IndexDescription
.CallModern Interfaces to C/C++ code
.InternalCall an Internal Function
.PrimitiveLook Up a Primitive Function
.LibrarySearch Paths for Packages
capabilitiesReport capabilities of this build of R
ArithmeticArithmetic Operators
ConstantsBuilt-in Constants
NANot Available / Missing Values
NULLThe Null object
Internal MethodsMany R-internal functions are generic and allow methods to be written for.
sourceRead R code from a file, a connection or expressions
libraryloading/attaching and listing of packages
getwdGet Working Directory
setwdSet Working Directory
lslist all objects
Sys.info
Sys.timeGet current Date and Time
Sys.timezoneGet current TIme Zone
Sys.localeconvGet details of the numerical and monetary representations in the current locale
Sys.sleepSuspend execution for a time interval
system.timeCPU time used
system.fileFInd name of R system file
systemInvoke system command
system2Invoke system command
dyn.loadForeign Function interface
datesystem date and time
proc.timerunning time of R
quitterminate an R session
catconcatenate and print
typeofThe typeof an object
arraycreates array
as.array
is.array
cCombine values into a vector or list
cbindcombine R objects by columns
rbindcombine R objects by rows
integerCreates objects of type integer
as.integer
is.integer
numericNumeric Vectors
rangerange of values
vectorProduces a ‘simple’ vector of the given length and mode
as.vectorA generic, attempts to coerce its argument into a vector of a given mode
is.vectorreturns TRUE if the vector is of the specified mode having no attributes other than names
appendvector merging
namesnames of an object
labelsFind labels from objects
lengthlength of an object
lengthslength of list or vector elements
sumsum of vector elements
cumsumcumulative sums
cumprodcumuative products
cummaxcumulative maxima
cummincumulative minima
summaryobject summaries
attachattach object to search path
detachdetach object from search path
diffLagged Differences
functionfunction definition
is.functionchecks whether argument is a function
lapplyApply function over a list or vector
data.frameData Frame
data.matrixconvert dataframe to a numeric matrix
mat.or.vecCreate matrix or vector
ttranspose of a matrix
subsetsubsetting vectors or matrix
max.colFind maximum position in a matrix
diagForm diagonal matrix
normCalculates the norm of the matrix
lower.trilower triangle of a matrix
upper.triupper triangle of a matrix
dimdimensions of an object
dimnamesDimension names of an object
detcalculate determinant of a matrix
cholThe Cholesky Decomposition
chol2invInverse from QR decomposition
eigenSpectral Decomposition of a matrix
qrThe qr decompostion of a matrix
qr.XReconstruct the Q, R or X matrices from a QR object
colSumscolMeanscolumn Sums/Means
rowSumsrowMeansrow Sums / Means
crossprodMatrix cross product
uniqueextract unique elements
expressionunevaluated expression
evalevaluate an expression
unlistflatten list
whichWhich indices are true
which.minWhere is the min() or max() or first TRUE of FALSE
ifelseconditional element selection
warningPrint warning message
warningsPrint warning messages
writeWrite data to a file
tempfileCreate name for temporary file
ncharcounts the number of characters
rankReturns the sample ranks of the values in a vector Ties
zapsmallRounding of numbers
utf8ToIntconverts a lenth-one character string encoded in UTF-8 to an integer vector of Unicode code points
IntToUtf8converts a numeric vector of Unicode code points either to a single character string or a character vector
cutconvert numeric to factor
tablecross tabulation and table creation
marginSumscompute table margins
toStringconvert an R object to a character string
strsplitsplit the elements of a character vector
callFunction calls
tapplyapply a function to each cell of a ragged array
jitteradd noise to numbers

Some Functions from utils-package

IndexDescription
RhomeR Home directory
RSiteSearchSearch for keywords or phrases in documentation
helpdocumentation
help.searchsearch the help system
help.starthypertext documentation
nsllook-up for IP address
install.packagesinstall packages for repositories or local file
installed.packagesfind installed packages
available.packageslist available packages at CRAN like repositories
update.packagesCompare Installed Packages with CRAN-like repositories
remove.packagesremove installed packages
browse.EnvBrowse objects in environment
browseURLLoad URL in an HTML browser
headreturn the first or last part of an object
object.sizereport the size allocated to an object
strlist the structure of an arbitrary R object
ls.strlist objects and their structure
zipcreate zip archive
unzipextract or list zip archives
read.fortranRead Fixed-Format Data in a Fortran-like Style
read.tableData input
write.tableData output
as.romanget roman numeral
sessionInfoGet and report version information about R, the OS and attached or loaded packages
SHLIBBuild shared object/DLL for Dynamic loading
download.filedownload a file from the Internet
exampleRun al the R code from the examples part of R’s online help
maintainershow Package maintainer

Some Functions from graphics-package

IndexDescription
barplotBarplots
histhistograms
dotchartcleaveland’s dot plots
plotscatter plot
stripchart1-D scatter plots
smoothScatterScatterplots with smootheed densities color representation
plot.xybasic internal plot function
stemStem-and-Leaf Plot
boxplotBox Plots
boxplot.matrixdraw a boxplot for each column(row) of a matrix
bxpdraw Box plots for summaries
matplotplot columsn of matrices
cdplotConditional Density Plots
coplotconditioning plots
contourdisplay contours
filledcontourlevel (contour) plots
curvedraw function plots
mosaicplotMosaic plots
piePie Charts
spineplotSpine Plots and Spinograms
starsStar(Spider/Radar) Plots and segment diagrams
strwidthplotting dimensions of character strings and math expressions
sunflowerplotProduce a sunflower scatter plot
symbolsDraw symbols(circles, squares, stars, thermometers, boxpots)
framecreat / start a new plot frame
layoutspecifying complex plot arrangements
parset or query graphical parameters
gridadd grid to plot
ablineadd straight ines to a plot
linesadd connected line segments to a plot
segmentsAdd line segments to a plot
boxdraw a box around a plot
clipset clipping region
axisadd an axis to a plot
axTickscompute axis tickmark locations
arrowsadd arrows to a plot
legendadd legends to plots
textadd text to a plot
titleplot annotation
mtextwrite text into the margins of a plot
imagedisplay a color image