SAS PLOT

使用SAS进行绘图,这里做一个总结,也做一个代码模板,这样以后好借鉴查阅(抄)。

首先是最常见的图系列。

1.直方图

直方图,histogram,用于展示数据的分布情况,这种图适用于连续的计量资料,从直方图中还能够检查数据的正态和偏态的情况。

在sas中,可以使用univariate过程,sgplot过程,gplot过程都可以绘制直方图。

1
2
3
4
5
6
7
data test;
set sashelp.cars;
run;

proc univariate data=work.test;
histogram length;
run;

这样即能够获得最简单的直方图,但是,对于直方图我们还是想有一些更清晰的要求:

  • 其一,对于每个条的宽度,希望能够手动控制,一些细节也能调整;

  • 其二,对于整体数据的情况我们希望有平滑的曲线能够展现一下;

  • 其三,我们希望能够根据某种控制分组,分开展示每个组别中的变量的情况。

因此,使用univariate不是很足够了,我们需要专门的绘图的函数。实际上,本身univariate是做统计分析过程的,绘图只是顺便而已,所以功能并不强。所以我们需要:

PROC SGPLOT—Statistical Graphic

histogram在sgplot中只能和density连用,构成直方图+密度图,不能和其它的直接连用。

histogram的通用语法为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
HISTOGRAM response-variable </ option(s)>;

Histogram options:
BOUNDARY= LOWER | UPPER
FILL | NOFILL
FILLATTRS= style-element | (COLOR= color)
FREQ= numeric-variable
OUTLINE | NOOUTLINE
SCALE= COUNT | PERCENT | PROPORTION
SHOWBINS

Plot options:
LEGENDLABEL= “text-string”
NAME= “text-string”
TRANSPARENCY= numeric-value
X2AXIS
Y2AXIS

其中,必须的参数是response-variable,其它都是可选项,每个参数的内容的具体内容为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
OUNDARY= LOWER | UPPER
specifies how boundary values are assigned to bins.
LOWER
specifies that boundary values are assigned to the lower bin.
UPPER
specifies that boundary values are assigned to the upper bin.
Default: UPPER

FILL | NOFILL
specifies whether the area fill is visible. The FILL option shows the area fill. The
NOFILL option hides the area fill.
Default: The default status of the area fill is specified by the DisplayOpts attribute
of the GraphHistogram style element in the current style.

FILLATTRS= style-element | (COLOR= color)
specifies the appearance of the area fill. You can specify the color of the fill by using
a style element or by using the COLOR= suboption. For more information about
specifying colors, see the "SAS/GRAPH Colors and Images" chapter in the

SAS/GRAPH: Reference.
Note: This option has no effect if you specify the NOFILL option. 􀀀
Default: The default color is specified by the Color attribute of the
GraphDataDefault style element in the current style.

FREQ= numeric-variable
specifies that each observation is repeated n times for computational purposes, where
n is the value of the numeric variable. If n is not an integer, then it is truncated to
an integer. If n is less than 1 or missing, then it is excluded from the analysis.

LEGENDLABEL= “text-string”
specifies a label that identifies the histogram in the legend. By default, the label of
the response variable is used.

NAME= “text-string”
specifies a name for the plot. You can use the name to refer to this plot in other
statements.

OUTLINE | NOOUTLINE
specifies whether outlines are displayed for the bars. The OUTLINE option shows
the outlines. The NOOUTLINE option hides the outlines.
Default: The default status of the outlines is specified by the DisplayOpts attribute
of the GraphHistogram style element in the current style.

SCALE= COUNT | PERCENT | PROPORTION
specifies the scaling that is applied to the vertical axis. Specify one of the following
values:
COUNT:the axis displays the frequency count.
PERCENT:the axis displays values as a percentage of the total.
PROPORTION:the axis displays values as proportions (0.0 to 1.0) of the total.
Default: PERCENT

SHOWBINS
specifies that the midpoints of the value bins are used to create the tick marks for
the horizontal axis. By default, the tick marks are created at regular intervals based
on the minimum and maximum values.

TRANSPARENCY= numeric-value
specifies the degree of transparency for the histogram. Specify a value from 0.0
(completely opaque) to 1.0 (completely transparent).
Default: 0.0

X2AXIS
assigns the response variable to the secondary (top) horizontal axis.

Y2AXIS
assigns the calculated values to the secondary (right) vertical axis.

看上去挺复杂,那不如来个全套看看什么情况:

1
2
3
4
5
6
7
8
9
10
11
12
title "Histogram Plot Using SGplot with ALL Options";
proc sgplot data=work.test;
histogram length /
boundary=lower
fill
outline
showbins
transparency=0.8
binwidth=10
;
density length / type=kernel;
run;

如果我们想按照origin来画不同的直方图,所以我们需要用by来切分数据之后绘图,sas中凡是要用到by,就得先排序,然后在绘图中调用即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
proc sort data=work.test;
by origin;
run;

title "Histogram Plot Using SGplot with ALL Options By Origin";
proc sgplot data=work.test;
histogram length /
boundary=lower
fill
outline
showbins
transparency=0.8
binwidth=10
;
density length / type=kernel;
by origin;
run;

这样会生成按照origin分组的三张直方图,能达到我们的目的。

由此,我们的目的即可达成。

最简化通用语法

可以看到,sgplot的语法总算是比较规整了,其绘图思路也比较好理解,当然,也有一些其奇奇怪怪的地方,

如果不想反复更换图像中的各种元素,那么完全可以直接使用最简单的画图语法,proc sgplot传入数据,然后调用对应的绘图函数名来画图即可,但是同样,

常见的绘图函数名有:

Usage Fuc
直方图 histogram variable
散点图 scatter
折线图 hline/vline
箱线图 vbar/hbar
柱状图 vbox/hbox
时间序列图 series
回归线图 reg
模板图 GTL

当然,除此之外,还涉及到一些图的组合,图例的位置,keylegend详细设置,颜色设置等细节,这部分可以慢慢研究(本质上很多时候也用不上……)

此外,还有一个比较重要的需求,在sas中,部分图的类型是可以组合的,比如vbar和vline,就能够直接组合,但是scatter和vbar就没法直接组合,这种时候怎么版?

sgplot系统也提供了两个解决方法,其中sgscatter是一个相对特殊的解决方法,但是另外一个,sgpanel就是比较通用的方案了,用这两个方法,能够对数据进行任意的组合出图。

而具体内容……

当然是下次一定!