DataFrames are generic data objects of R which are used to store the tabular data. A dataframe is a table or a two-dimensional array-like structure in which each column have values of one variable and each row contains one set of values from each column. Dataframes are integrally important to using R for any kind of data analysis. One of the most frustrating aspects of R for new users is that, unlike Excel, or even SPSS or Stata, it is not terribly easy to look at and modify data in a spreadsheet like format.
Following are the characteristics of a data frame.
The column names should be non-empty.
The row names should be unique.
The data stored in a data frame can be of numeric, factor or character type.
Each column should contain same number of data items.
edit and fix
R provides two ways to edit an R dataframe (or matrix) in a spreadsheet like fashion. They look the same, but are different! Both can be used to look at data in a spreadsheet-like way, but editing with them produces drastically different results.
The first of these is edit, which opens an R dataframe as a spreadsheet. The data can then be directly edited. When the spreadsheet window is closed, the resulting dataframe is returned to the user. This is a reminder that it didn't actually change the mydf object. In other words, when we edit a dataframe, we are actually copying the dataframe, changing its values, and then returning it to the console. The original mydf is unchanged. If we want to use this modified dataframe, we need to save it as a new R object.
The second data editing function is fix. This is probably the more intuitive function. Like edit, fix opens the spreadsheet editor. But, when the window is closed, the result is used to replace the dataframe. Thus fix(mydf) replaces mydf with the edited data.
edit and fix can seem like a good idea. And if they are used simply to look at data, they're a great additional tool (along with summary, str, head, tail, and indexing).
Creating Data Frames
Data frames are usually created by reading in a dataset using the read.table() or read.csv(). However, data frames can also be created explicitly with the data.frame() function or they can be coerced from other types of objects like lists.
Adding on to Data Frames
We can leverage the cbind() function for adding columns to a data frame. One of the objects being combined must already be a data frame otherwise cbind() could produce a matrix.
Adding Attributes to Data Frames
Similar to matrices, data frames will have a dimension attribute. In addition, data frames can also have additional attributes such as row names, column names, and comments.
Subsetting Data Frames
Data frames possess the characteristics of both lists and matrices; if we subset with a single vector, they behave like lists and will return the selected columns with all rows; if we subset with two vectors, they behave like matrices and can be subset by row and column.
The top line of the table, called the header, contains the column names. Each horizontal line afterward denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data member of a row is called a cell.
To retrieve data in a cell, we can enter its row and column coordinates in the single square bracket "" operator. The two coordinates are separated by a comma. In other words, the coordinates begins with row position, then followed by a comma, and ends with the column position. The order is important.
Operations that can be performed on a DataFrame are:
Creating a DataFrame
Accessing rows and columns
Selecting the subset of the data frame
Adding extra rows and columns to the data frame
Add new variables to dataframe based on existing ones
Delete rows and columns in a data frame