Hello! I am Anna Dobrychenko, a teacher at the SAS Training Center in Russia: I conduct programming training at SAS Base, using SAS Enterprise Guide and SAS Visual Analytics, and participate in the training of interns. All technical documentation and articles of SAS products and solutions are set out in English, and localized material in RuNet is not enough.
Therefore, in our blog on Habrahabr I will talk about the basics of programming on SAS Base in a series of articles.
The SAS Base language is the basis of most of these SAS products and solutions, and articles will be devoted to it. In them I will introduce the terminology, data types with which SAS Base works, with the structure of the code written in SAS Base, and I will show the basic techniques that are used when writing SAS programs.
')
Introductory
There are several options to learn the basics of programming on the SAS Base.
- Read the documentation: all SAS solutions are well documented, making it easy for even a beginner to use. All directories are on the support site . Of the benefits worth noting that you can find everything that interests for free, of the minuses - a long time, and all the information in English.
- View a free online course in e-learning format. Pros - everything is explained in detail with examples and practical exercises, cons - a long time (24 hours), in English. You can also watch SAS videos on Youtube.
- Buy a book, for example, Little SAS Book. The book introduces novice users with the language SAS Base, contains practical examples and exercises. But again, all the literature produced by SAS is in English and has a relatively high cost. All books are presented on the official SAS website in the "Training" section, you can also use the Amazon resource. But if you are a student and participate, for example, in our internship program, then you might be lucky and you received a book as a gift, which is good news.
- Read our course of articles on the basics of programming on the SAS Base. These articles are addressed to new SAS users working in various areas of business, by and large for everyone who is going to analyze data using SAS solutions or write their programs on the SAS Base.
I will try to acquaint the SAS Base language with concrete practical examples with brief explanations containing a minimum of technical jargon.
Mining tools
Learning how to program on SAS Base is possible on a free interface called SAS University Edition or abbreviated SAS UE.
SAS UE is a powerful tool provided by the SAS Institute. The user interface for the SAS UE (known as SAS Studio) is based on a web browser - this is a web client.
You can
download it on the SAS website for free. On the PC, the SAS UE runs under a virtual machine and requires the installation of virtualization software. For Windows, Oracle VM VirtualBox and Vmware Player are suitable. All installation information can be found
here in this document .
When you run a program or task, SAS Studio connects to the SAS server to process the code. The SAS server can be located in the cloud, on-premises, or on a local computer. After processing the code, the results are returned to SAS Studio in your browser.
SAS Studio supports several web browsers: Microsoft Internet Explorer, Apple Safari, Mozilla Firefox, and Google Chrome.
We understand the interface
A little bit about how the SAS UE interface looks.
On the left is the navigation bar, on the right - the workspace. The three main tabs of the workspace are “Code”, “Journal”, “Results”.
Syntax help appears as soon as you reduce the list of relevant keywords. Syntax help also appears if you right-click on a keyword in a program and select Syntax Help.
You can go to the documentation page for a particular procedure by clicking on the link “Product Documentation”.
On the “Output” tab you can see the created tables.
On the Results tab, you can view the output of the procedures that generate the reports.
If any part of the program is used frequently, you can add it to the "Code snippets".
After starting and running any, even the simplest program, I recommend opening and viewing the Log. Log is a tool for diagnosing and debugging potential problems associated with your program. The log displays the text of the running program, there are also three types of messages in it: notes, warnings and errors. Even if the error is not immediately visible, it is recommended to carefully consider the Log.
You can open the help (SAS Help) and documentation directly from the main toolbar.
SAS Studio Help (SAS Studio Help) to go to the SAS Studio documentation page. This web page contains help for the SAS Studio interface.
If you are just starting to study SAS products, then you are probably unfamiliar with some of the features of the terminology used in the company's products.
Sas7bdat and data
To begin with, SAS Base only works with a special data format called SAS Data Set (SAS data set). But among other things, SAS is a very flexible tool and can read almost any data, converting it into a SAS Data Set. The SAS dataset is a regular flat table consisting of rows and columns. The SAS dataset is stored as a file with the .sas7bdat extension.
From the point of view of traditional SAS terminology, data sets consist of variables (variables) and observations (observations). By analogy with the terminology of relational databases, variables are columns, and observations are rows.
Consider the example described above.
The presented program creates a table of people (we will look at the syntax later):
data people; infile datalines dlm=' ' dsd missover; length Id 8 Fist_Name $12 Last_Name $20 Phone_number $20; input Id Fist_Name Last_Name Phone_number; datalines; 125001 Gregory Backer +1-567-244-5678 245002 Albert Hardman +1-862-444-3333 126003 Amanda Wesley . Gloria Carter +1-963-542-2154 111005 Colin +1-964-584-1111 ;
The source data may have a different look, SAS has simplified this task. In SAS, there are only two types of data: numeric (numeric) and character (character). In the people dataset, the variables First_Name, Last_Name, Phone_Number are of character type, and the variable Id is numeric. It is worth noting that dates in SAS are also numbers.
This program creates a time data set containing the current date, time and date and time (datetime) in SAS format:
data time; Current_date=Today(); Current_time=Time(); Current_datetime=Datetime(); run;
A possible view of the data set is presented below:
All three values ​​are dates in SAS format. So, the date in SAS format is the number of days from January 1, 1960 to the current date, the time in SAS format is the number of seconds starting from midnight of the current date, the date and time is the number of seconds starting from midnight 1960. It is in this form that dates are stored in SAS format in data sets.
Sometimes the data may be incomplete, as in the people dataset: some values ​​are missing. In SAS, there is the concept of missing - missing value. In fact, the missing value is a type of value for a variable that does not contain data for a specific row or column. By default, SAS records the missing numeric value as a dot and the missing value of the text variable as a space. When comparing, “Missing” is always equal to “Missing,” while in the comparison operators “Missing” is the lowest value.
In addition to the actual values, the SAS dataset has data such as variable type, length, name, labels, formats, called dataset attributes.
Variables and Attributes
Variables in SAS have a number of attributes, let's get acquainted with some of them.
Variable length is the number of bytes per character.
This code demonstrates the above:
data _null_; word1='SAS Institute'; LEN=length(word1); putlog 'Length of word1 is ' LEN; run;
Log fragment:
Variable names , like data sets, are self-defined. There are a number of rules for naming SAS variables:
- Names must not exceed 32 characters.
- Names must begin with a letter or underscore.
- Names can contain only letters, numbers, or underscores.
- In the names of variables can not use special characters, including space.
- Names can contain both uppercase and lowercase letters, since SAS is not case sensitive when naming entities (variables, data sets, libraries, and so on). You can refer to the variable in the code in any case. But! It is important that SAS remembers the first appearance of the variable name in the program and uses it when generating the report.
Variable format is a rule for visual data conversion in a report. It is important to understand that the values ​​in the table do not change. Below is an example of how the date and time can be presented in a report, but their values ​​are stored in the source table as a number.
In subsequent articles, we will learn more about the attribute format.The variable label is used in reports instead of variable names. The label can contain up to 256 characters, including special characters, including the space character. In subsequent articles, we will look at the types of shortcuts and how they are used in the program code, as well as techniques when, for example, a space can be used in the variable name.
SAS Base program structure
Let's get acquainted with the structure of the program in the language SAS Base.
All SAS programs consist of only two steps: the PROC step and the DATA step. The DATA step is intended for reading, converting, and creating SAS datasets, and the PROC procedural step is mainly for analyzing data, generating and printing reports. The steps consist of operators. The end of a step indicator is the RUN keyword (or, for example, QUIT for a number of procedures), and the STOP and ABORT operators can also signal the end of a step. Steps can be placed in any order, the compiler reads sequentially step by step. It is worth noting that the steps themselves are read line by line and nothing else. Different parts of the program exchange data with each other in the form of SAS data sets.
The SAS Base syntax is very simple, as is the writing of the code.
An example of the simplest SAS program is presented below:
data new; set sasuser.ads; run; proc print data=new; run;
An important syntax requirement is the semicolon at the end of each statement. This SAS program reads the ads data set from the sasuser library (we will get acquainted with the notion “library” in the next article), creates a new data set new. In the next step, we create a report from a new dataset.
It should be noted that SAS Base has no requirements for code formatting. You can write code in one line and the code will work. To format the code in SAS U, click on the "code format" button:
A good tone for the programmer is the explanations to the source code of the program. Comments do not affect the semantics of the program.
There are two kinds of comments in SAS Base:
- Commented out operator -
*;
data new; set sasuser.ads; run; proc print data=new; *var Sales; run;
Log fragment:
/*Create new data set*/ data new; set sasuser.ads; run; /*Create new report*/ proc print data=new; run;
Log fragment:
As noted earlier, Log needs to be studied in detail. Let's look at some of the most common syntax errors:
- A typo in the data or proc keyword.
daat new; set sasuser.ads; run; proc print data=new; run;
In this case, the step will work with a warning. In Log we will see the following information:
data new; set sasuser.ads; run; proc print data=new run;
In this case, run at the second step of the program will be regarded as a parameter of the proc print operator.
The error will look like this:
- Unmatched quotes. In SAS Base, you can use single and double quotes, which must be paired.
data test; infile datalines dlm=',; input xyz; datalines; 1,2,3 ;
With unmatched quotes, the backlight is triggered and in this case, the following message (Log fragment) is entered in the Log:
So, this is a summary of the SAS UE interface, the SAS Base terminology and the basic requirements for the SAS Base syntax. In the next article, we will look at the SAS libraries and their creation, the creation of detailed reports, and the formatting of values ​​and the assignment of constant attributes to variables.
I am sure that working with SAS will be interesting and exciting. Grow with SAS!