In the previous
article, we familiarized ourselves with the
SAS UE interface, SAS Base terminology, data types, basic requirements for the SAS Base syntax, and also considered common syntax errors.
Today I will tell you how you can access data in various formats. Please note that this article has links to documentation that will help you learn more about the examples in question.

What are SAS libraries?
A library in SAS is a method for centrally storing and transparently using data in SAS programs. The library can be a folder or directory on your computer or located on an external hard drive, a FLASH drive or a CD, and so on.
')
There are
two types of libraries: permanent and temporary. Permanent SAS libraries are retained until you delete them. The persistent library is available for processing in subsequent SAS sessions. The SAS temporary library exists only for the current SAS session.
SAS files created during a session are stored in a special workspace, which may or may not be external media. This workspace is usually assigned by default with the name Work. Files in the temporary working library can be used at any step in the SAS program, but they are not available for subsequent SAS sessions. Files stored in the working library are deleted at the end of the SAS session.
We assign the user library.
Consider a simple case of assigning a library: SAS data sets are in the same directory
c: \ habrahabr . There are two ways to solve this problem.
1 way:
Set up a library without software code. In the "Libraries" tab in SAS UE select "New Library":
Next, a window will appear to configure the user library:
The library name is library reference (or libref). Libref is the “nickname” (link) for the “repository” in which the files are located. The name of the library is set in accordance with the naming rules for variables in SAS (
see Lesson 1 ), but it is allocated no more than 8 characters.
The rules for naming libraries, variables, data sets, etc. in SAS can be found in the SAS 9.4 and SAS Viya 3.3 Reference Programming Documentation Documentation Documentation
in the SAS Language section.
Note that the library is assigned to the entire SAS session, but you can override its parameters.
Next, set the path to the SAS datasets.
After the library is assigned, it appears in the left pane of the SAS UE.
2 way:
Assign library programmatically. The purpose of the SAS library is implemented using the global operator LIBNAME. Information on the specified operator can be found in the
SAS 9.4 and SAS Viya 3.3 Programming Documentation / Global Statements directory.
Consider the general syntax of the global operator LIBNAME.
LIBNAME libref <engine> 'SAS-library' < options > <engine/host-options>;
libref is the name of the library.
engine is the name of the “engine”, for example, for SAS data sets it is BASE (but it can be omitted, it is set by default). If you want to create a new library with a different “engine” other than the default mechanism, you can cancel the automatic selection.
Reference books that may be useful to you when studying connection mechanisms:
SAS / ACCESS for Relational Databases and
SAS Engines .
The SAS / ACCESS engines are LIBNAME operator mechanisms that provide access to reading, writing, and updating more than 60 relational and non-relational databases, PC files, storage devices, and distributed file systems.
'SAS-library' - the path to the library, if the path is specified using a macro variable (will be considered in this series of articles), paired double quotes are used. In all other cases, you can use paired single quotes.
options are valid options. The simplest example is the ACCSESS = READONLY option, which assigns a read-only attribute to the entire SAS library. The entire list of valid options can be found in the
SAS 9.4 and SAS Viya 3.3 Programming Documentation / Global Statements directory.
engine / host-options — These are one or more parameters that are listed in the generic form keyword = value.
Consider the syntax of the operator LIBNAME in practice. Assign read-only
Habr library:
libname Habr "c:\habrahabr" access=readonly;
Run the code and check the Log:
Log messages indicate that the library is successfully assigned. As libraries, you can include data that is not physically in the format of SAS data sets (industrial databases, excel, access).
View the contents of the library SAS.
One of the options for viewing the contents of the library is to use the
PROC CONTENTS procedure. You can familiarize yourself with the procedures used in SAS in the
SAS 9.4 Procedures by Name and Product .
The PROC CONTENTS procedure allows you to create output that describes either the contents of the SAS library or the descriptor information for an individual SAS dataset. To view the contents of the SAS library, we can use the following general form of the procedure:
proc contents data = libref._ALL_ nods; run;
The NODS parameter (which means “no details”) suppresses printing detailed information about each file when specifying the _ALL_ option.
For a specific library, the code will look like this:
proc contents data=habr._all_ nods; run;
Fragment of the withdrawal procedure:
Note that other types of files are also stored in the library, for example
catalog ,
index . You can read about them in the
SAS 9.4 Companion for Windows, Fifth Edition reference book.
Files with member type DATA are standard SAS datasets. The second way to view the contents of the library is to use the
PROC DATASETS procedure:
proc datasets lib=habr; quit;
Viewing information about a specific SAS dataset is implemented as follows:
proc contents data=habr.charities; run;
Note the reference to the table in the user library. Name after data = two-level:
library_name.table_name . In the case of a dataset stored in the WORK temporary library, you can use the single-level name after data =.
For example, in the case of code:
proc contents data=charities; run;
displays information about the
charities dataset in the WORK temporary library.
Consider the output of the procedure for the
charities dataset in the
HABR user library:
The service information about the table resulting from the output is called a descriptor.
The descriptor contains general information about the data set: its name and library name, type, engine, date and time of creation and last change, number of observations and variables, total length of observations, indices, sorting, compression, page size and their number, information about variable attributes.
We read spreadsheets.
Reading the EXCEL file can be implemented in several ways. In this article we will look at the library assignment for the excel file.
To assign the SAS library, we will use the
products.xlsx spreadsheet stored in the
c: \ workshop \ habrahabr \ products.xlsx directory. This document is as follows: it contains 4 sheets, each of which will be a separate set of SAS data. A fragment of this document is presented below:
The general syntax for assigning a library is the same as in the case of SAS datasets, only the connection mechanism changes:
LIBNAME libref <engine> 'SAS-library' < options > <engine/host-options>;
There are several mechanisms for processing an excel file, all have their own characteristics and settings, which can be found in the
documentation .
- Excel
libname XL excel "c:\workshop\habrahabr\products.xlsx";
The result of the libname statement is shown below. Log fragment:
Information about the library will look through the procedure PROC CONTENTS:
proc contents data=xl._all_ nods; run;
Depending on the mechanism, the descriptor is filled in different ways:
- Xlsx
libname xl xlsx "c:\workshop\habrahabr\products.xlsx";
The result of the LIBNAME statement is presented below. Log fragment:
Information about the library will look through PROC CONTENTS:
proc contents data=xl._all_ nods; run;
Depending on the mechanism, the descriptor is filled in different ways:
- Pc files
libname XL pcfiles path='c:\workshop\habrahabr\products.xlsx';
The result of the libname statement is shown below.
Log fragment:
Information about the library will look through PROC CONTENTS:
proc contents data=xl._all_ nods; run;
Depending on the mechanism, the descriptor is filled in different ways:
Create detailed reports.
After gaining access to the required data, consider the procedure for creating PROC PRINT reports. Detailed information about it can be obtained in the
SAS 9.4 Procedures by Name and Product . We print the detailed report using the
German table from the
sasuser system library.
First, let's examine the descriptor of the specified table, we are interested in the attributes of the columns:
proc contents data=sasuser.german; run;
Fragment of the withdrawal procedure:
Create a detailed report that meets the requirements below:
- Do not include the Change and Retain variables in the report :
proc print data=sasuser.german; var ID Pre Group Gender; run;
The var operator defines variables for printing. The operator displays them in the order in which you listed them.
Fragment output:
- Print only those observations, the values of the Pre variable in which do not exceed 70.00 and the values of the Group variable are equal to Treatment.
In this condition, we need to use a filter in the WHERE statement.
proc print data=sasuser.german; var ID Pre Group Gender; where Pre le 70 and Group='Treatment'; run;
Please note that when working with text variables, case is important, and you must also enclose the required value in double quotes (double or single).
The output of this SAS program:
- Do not print line numbers.
By default, the PROC PRINT procedure displays line numbers, in order to remove this column, you can use the NOOBS option ('no observation'). In this case, the program code looks like this:
proc print data=sasuser.german noobs; var ID Pre Group Gender; where Pre le 70 and Group='Treatment'; run;
Comparison operators can be written in familiar characters, or you can use mnemonics , as shown in the example. The output of this program is presented below:
- The variable ID is identified by an identifying column.
The observation identifier can define any variable. When you specify one or more variables in the ID operator, it uses the formatted values of these variables to identify strings. Note that if a variable is simultaneously specified in the VAR operator and in the ID operator, then it will be displayed twice. Also, when using the ID operator, there is no need for the NOOBS option.
In our case, the SAS program will look like this:
proc print data=sasuser.german; var Pre Group Gender; where Pre le 70 and Group='Treatment'; ID ID; run;
The results of the code are presented below:
- Set shortcuts to variables
It is worth noting that when defining attributes such as a label and a format, they will be used only at a certain PROC step to create the required report.
To set a shortcut, use the LABEL operator.
The general syntax of the LABEL statement is as follows:
LABEL variable-1=label-1...<variable-n=label-n>;
In the label, you can use any characters, including spaces, the number of characters should not exceed 256. Variable labels will be used to create reports.
Not all procedures “see” labels. In order for the PROC PRINT procedure to print labels to the report, rather than variable names, you must specify label (or split =) in the options. The SPLIT option specifies a separator that controls breaks in column headers. We use the LABEL operator in our program code:
1 option
proc print data=sasuser.german label; var Pre Group Gender; where Pre le 70 and Group='Treatment'; ID ID; label ID = 'Identifier' Pre = 'Pre credit' Group = 'Customer Group' Gender = 'Customer Gender'; run;
Option 2
(using the split = option)
The split = option specifies a separator (required in quotes). The code in this case is as follows (note the use of delimiters in the Label operator):
proc print data=sasuser.german split='*'; var Pre Group Gender; where Pre le 70 and Group='Treatment'; ID ID; label ID = 'Identifier' Pre = 'Pre credit' Group = 'Customer*Group' Gender = 'Customer*Gender'; run;
- Set the report header and footer
The TITLE statement sets the title in the report, the FOOTNOTE statement sets the footer.
As in the case of the LIBNAME statement, these statements are global and are valid throughout the entire SAS session until you redefine their values. You can use TITLE and FOOTNOTE both outside the PROC steps and directly in them.
General syntax of statements:
TITLE<1...10> <text-options> <"text-string–1"> ... <text-options> <"text-string-n"> ;
FOOTNOTE<1...10> <text-options> <"text-string–1"> ... <text-options > <"text-string-n"> ;
Text-string - this argument is a string that can contain up to 512 characters. You need to enclose such strings in single or double quotes. The text is displayed exactly as you enter it in the operator, including uppercase, lowercase letters and spaces.
For the title in the report, you can use 10 TITLE statements (similarly to FOOTNOTE). It is important to understand how values are redefined. The logic for the execution of statements is the same. Consider overriding header values using the example of the TITLE operator.
Code | Result | Note |
---|
TITLE 'String number one'; | String number one | TITLE and TITLE1 are equivalent |
TITLE 'String number three' | String number one
String number three
| Line 2 is empty |
TITLE2 'String number two'; | String number one String number two | TITLE2 set the header value on the second line and deleted the header values below. |
TITLE1 'New Title'; | New title | TITLE1 set the header value on line 1 and deleted the header values below. |
TITLE; | | An empty TITLE statement removes all headers.3 |
In other words, the TITLEn operator overrides the header value in row n and deletes the header values of rows c n + 1 to 10.
You can also use styles to set headers and footers:
title bold Italic color=brown bcolor=darkseagreen height=5'Habrahabr';
Result of performance:
So, back to the report being developed:
title 'Example for Habrahabr'; footnote 'Created by Anna Dobrychenko'; proc print data=sasuser.german split='*'; var Pre Group Gender; where Pre le 70 and Group='Treatment'; ID ID; label ID = 'Identifier' Pre = 'Pre credit' Group = 'Customer*Group' Gender = 'Customer*Gender'; run; title; footnote;
The result of the program:
- Set the Pre format variable
Format is a rule for outputting variable values to a report. You must understand that the format does not change the values in the SAS dataset. The types of formats correspond to the data type, but are divided into categories: numeric, character, date, time, date-time.
All format information can be found in the SAS 9.4 Formats and Informats: Reference. The ability to create custom formats is also supported, we'll talk about this in the following articles.
The general syntax of the FORMAT statement.
FORMAT variable-1 <...variable-n> <format> <DEFAULT=default-format>;
variable - one or more variables to which the format should be applied.
DEFAULT = default-format - defines the default temporary format for displaying the values of variables that are not specified in the FORMAT statement, used in the DATA step.
format - defines the format that is used to display variables.
The general syntax for using a format in the FORMAT statement is as follows:
<$>format<w>.<d>
$ - text format attribute
Format - format name
w - the width of the format, the number of all displayed characters in the value
d - the number of decimal places
The format always ends in a dot or in the number of decimal places. It is worth noting that if the width of the format is chosen incorrectly, the values in the output can be cut off. Let's look at an example:
Variable value | Format | Result |
---|
34566.78 | DOLLAR10.2 | $ 34,566.78 |
34566.78 | DOLLAR9.2 | $ 34,566.78 |
34566.78 | DOLLAR8.2 | 34566.78 |
34566.78 | DOLLAR7.2 | 34566.8 |
34566.78 | DOLLAR6.2 | 34567 |
34566.78 | DOLLAR4.2 | 35E3 |
The value in the SAS dataset remains unchanged:
data test; Var=34566.78; CurrentValue=Var; FORMAT Var dollar4.2; run; proc print data=test noobs; run;
The result of the step:
So, let's apply the format to the variable Pre:
title 'Example for Habrahabr'; footnote 'Created by Anna Dobrychenko'; proc print data=sasuser.german split='*'; var Pre Group Gender; where Pre le 70 and Group='Treatment'; ID ID; label ID = 'Identifier' Pre = 'Pre credit' Group = 'Customer*Group' Gender = 'Customer*Gender'; format Pre dollar6.2; run; title; footnote;
The report looks like this:
- Group the report by the Gender variable, and leave the Pre, Group and Gender variables in the report.
To group the variables, use the BY operator. Grouping variables by specific values implies sorting the table. This is due to the processing of SAS Base data.
You can sort the data set using the PROC SORT procedure.
When sorting a dataset, you need to specify the source (source dataset), the grouping variable or variables, and, if necessary, the output (intermediate) dataset.
Let's check the above. If you look at the report displayed in item 7 of this article, the Gender column is sorted by gender in descending order. Is it so?
title 'Example for Habrahabr'; footnote 'Created by Anna Dobrychenko'; proc print data=sasuser.german split='*'; var Pre Group Gender; where Pre le 70 and Group='Treatment'; ID ID; label ID = 'Identifier' Pre = 'Pre credit' Group = 'Customer*Group' Gender = 'Customer*Gender'; format Pre dollar6.2; BY descending Gender; run; title; footnote;
The result of the procedure is presented below:
Add a few options to create a final report: nobyline and #byVAL ()
options nobyline; title 'Example for Habrahabr'; title2 'Report for Gender: #BYVAL(Gender) and for Group: Treatment'; footnote 'Created by Anna Dobrychenko'; proc print data=sasuser.german split='*'; var Pre; where Pre le 70 and Group='Treatment'; ID ID; label ID = 'Identifier' Pre = 'Pre credit'; format Pre dollar6.2; BY descending Gender; run; title; footnote; options byline;
Thus, we obtained the required detailed report based on the German set in the Sasuser library.
So, this is briefly about organizing data access and creating detailed reports. In the next article, we will look at reading external files, examine the INFILE and INPUT operators, and the PROC IMPORT procedure.
And, of course, traditionally - Grow with SAS!