Reports on the state of storage using R. Parallel computing, graphs, xlsx, email and all this

The article provides code for generating regular reports on the state of EMC VNX storage drives with alternative approaches and creation history.

I tried to write code with the most detailed comments and one file. Only substitute your passwords. The format of the source data is also indicated, so I will be glad if someone tries to apply it at home.

Graphics Appearance

Background

You can skip if it’s not interesting where the “legs grow” from.
We have a data center. There are not very fresh storage systems. There are many storage systems, disk failures too. Several times a week people go to the data center and change drives in the storage system. The decision to replace disks is made after an alarm from the " Recommended disk replacement " system.

Nothing unusual.

Everything is good

But recently, individual LUNs collected on these storage systems and presented to the Virtual environment began to seriously degrade. After communicating with the vendor's technical support, it became clear that the disks should already be changed not only when the above alarm message appears, but also when a large number of other messages appear that the system does not consider critical errors.

SNMP monitoring by these storage systems is not supported. You need to use either expensive proprietary software (we don’t have it), or the NaviSECCli console utility, which needs to be connected to each controller (there are two) of each storage system, but this was not very desirable.

It was decided to automate the collection of logs and search for errors in them. And the decision to replace the disks should be left to responsible engineers based on the results of the analysis of the report.

The first steps

Initially, one of my colleagues wrote PowerShell code that did the following:

Took an input table that contained the ip addresses of the storage controllers;
the cycle went to the ip addresses of controllers A , then to the ip addresses of controllers B ;
in the process, additionally interviewed them for serial numbers of disks;
processed all the lines of the logs and filtered for the content of the desired messages;
created a PowerShell object and in its properties parsed the necessary data from the lines obtained above;
merged all the resulting objects into a table, which was issued in the form of csv.

The code is below. Immediately make a reservation that he is working, but we have introduced an alternative solution.

PowerShell Source

cd 'd:\Navisphere CLI\' $csv = "D:\VNX-IP.csv" $Filter1 = "name1" $Filter2 = "name2" $Filter3 = "name3" $Data = import-csv $csv -Delimiter ';' | Where {$_.cl -EQ $Filter1 -Or $_.cl -EQ $Filter2 -Or $_.cl -EQ $Filter3} | Sort-Object -Property @{Expression={$_.cl}; Ascending=$true}, @{Expression={$_.Name} ;Ascending=$true} #$Filter1 = "nameOfcl" #$Data = import-csv $csv -Delimiter ';' | Where {$_.Name -EQ $Filter1} $Data | select Name,IP,cl $yStart = (Get-Date).AddDays(-30).ToString('yyyy') $yEnd = (Get-Date).ToString('yyyy') $mStart = (Get-Date).AddDays(-30).ToString('MM') $mEnd = (Get-Date).ToString('MM') $dStart = (Get-Date).AddDays(-30).ToString('dd') $dEnd = (Get-Date).ToString('dd') #$start = (Get-Date).AddDays(-3).ToString('MM\/dd\/yy') #$end = (Get-Date).ToString('MM\/dd\/yy') $i = 1 $table = ForEach ($row in $Data) { Write-Host $row.Name -ForegroundColor "Yellow" Write-Host "SP A" Write-Host (Get-Date).ToString('HH:mm:ss') $txt = .\NaviSECCli.exe -scope 0 -h $row.newA -user myusername -password mypassword getlog -date $mStart/$dStart/$yStart $mEnd/$dEnd/$yEnd | Select-String -Pattern "\(820\)","\(803\)","\(801\)","\(920\)","\(901\)" ForEach ($n in $txt) { $x = $n -Split(' ') $disk = $x[3] + "_" + $x[5] + "_" + $x[7].Split("(")[0] $sn = (.\NaviSECCli.exe -scope 0 -h $row.newA -user myusername -password mypassword getdisk $disk -serial)[1] | %{$_ -replace "Serial Number: ",""} | %{$_ -replace "State: ",""} | %{$_ -replace " ",""} New-Object PSObject -Property @{ i = $i cl = $row.cl Storage = $row.Name SP = "A" Date = $x[0] Time = $x[1] Disk = $disk Error = (($n -Split('\['))[0] -Split('\)'))[1].Trim() eCode = (($n -Split('\('))[1] -Split('\)'))[0] SN = $sn } $i = $i + 1 } Write-Host "SP B" Write-Host (Get-Date).ToString('HH:mm:ss') $txt = .\NaviSECCli.exe -scope 0 -h $row.newB -user myusername -password mypassword getlog -date $mStart/$dStart/$yStart $mEnd/$dEnd/$yEnd | Select-String -Pattern "\(820\)","\(803\)","\(801\)","\(920\)","\(901\)" ForEach ($n in $txt) { $x = $n -Split(' ') $disk = $x[3] + "_" + $x[5] + "_" + $x[7].Split("(")[0] $sn = (.\NaviSECCli.exe -scope 0 -h $row.newA -user myusername -password mypassword getdisk $disk -serial)[1] | %{$_ -replace "Serial Number: ",""} | %{$_ -replace "State: ",""} | %{$_ -replace " ",""} New-Object PSObject -Property @{ i = $i cl = $row.cl Storage = $row.Name SP = "B" Date = $x[0] Time = $x[1] Disk = $disk Error = (($n -Split('\['))[0] -Split('\)'))[1].Trim() eCode = (($n -Split('\('))[1] -Split('\)'))[0] SN = $sn } $i = $i + 1 } Write-Host " " } $table | select i,cl,Storage,SP,Date,Time,Disk,Error,eCode,SN | Export-Csv -Path 'd:\VNX-Errors.csv' -NoTypeInformation -UseCulture -Encoding UTF8

Everything was fine, all that was left was to add a “gloss” in the form of an automatic sending of an email to interested colleagues and minimal formatting of the resulting csv. But (!) All this trouble worked out for a very long time. Data for a month, for example, was collected about 45 minutes , which was not very suitable, because in addition to regular reports, I wanted to make an analysis for the current year, which would be a very long time. But "rejecting - offer." They began to think.

Obviously, you need to optimize the code and enable parallel computing. In PowerShell , we did not succeed in more than 5 simultaneous threads using workflow , and we haven’t "smoked" alternative methods yet. So it was decided to try to shift the script logic to R. In the source code, the NaviSECCli utility, which can be run from under R , makes a survey of storage systems, so the solution is quite suitable.
It is said - ~~a couple of days~~ - done!

We decided that at the output I would like to receive a daily newsletter containing the total number of errors in the text of the letter, some kind of schedule for the number of accidents (so that there was something to show the management), and also an attachment in the form of an xlsx table. We determined that in the table I want to have 3 tabs:

Accident data for 3 days by disk and accident type
A similar tab, but for 30 days
Raw data (if someone wants to run them in Excel themselves)

Script algorithm

1. Download from csv the available data on the controllers;
2. run through parallel computations a cycle for all controllers with a search for records of the required alarm messages;
3. combine the results in a data frame;
4. do data processing and conversion;
5. generate xlsx document;
6. we form the schedule which we save in png;
7. form a letter containing the collected data;
8. send a letter.

Let's go through the points of the algorithm

1. Download the available data on the controllers from csv

Source table format with VNX parameters

 # A tibble: 83 x 9 Name IP cl type newA newB oldA oldB cntIP <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 1 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 2 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 3 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 4 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 5 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 6 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 7 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 8 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 9 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ 10 XXX 10.***.**~ XclNam~ 5300-1 10.201.1~ 10.201.1~ 10.***.*~ 10.***.*~ 10.***.*~ # ... with 73 more rows

To collect emergency information, you need to connect in series to both controllers ( newA and newB columns ) using specialized software from EMC - NaviCLI with certain keys.
For convenience, we reformat the resulting table after loading so that the IP addresses of both controllers are in the same column so that you can make one cycle through the entire list, rather than two consecutive ones. We do this using the gather function. The issues of working with "vertical" or "horizontal" data formats are very well described in the official documentation of the tidyverse library. You can read it here .

We read the data using the read_csv2 function, we also manually determine the types of columns through the additional parameter col_types . This is a good practice, as greatly speeds up loading. In our case, this is not so important, because The original csv contains less than 100 lines, but we get used to writing correctly.

 #   IP VNX. #         , #   ip        . VNX_ip <- vnxIPfilePath %>% read_csv2( col_types = cols( Name = col_character(), IP = col_character(), cl = col_character(), type = col_character(), newA = col_character(), newB = col_character(), oldA = col_character(), oldB = col_character() ) ) %>% filter(cl %in% productCls) %>% gather(key = "cntName", value = "cntIP", 5:6)

At the output, we get such a data frame (the new columns are cntName and cntIP ):

 # A tibble: 30 x 8 Name IP cl type oldA oldB cntName cntIP <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 1 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 2 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 3 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 4 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 5 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 6 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 7 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 8 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 9 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ 10 XXX 10.***.***.*~ XclNameX 5300~ 10.***.***.~ 10.***.***.~ newA 10.***.***.~ # ... with 20 more rows

2-3. We run through parallel computations a cycle for all controllers with a search for records of the required alarm messages. Combine the results in a data-frame

Next is the most interesting. Parallel computing .

In R there are several (rather, even many) options for parallel computing. I liked the bundle from the foreach and doParallel libraries more . You can read about them and other parallel computing options in R here .

In short, we take only 3 steps :
Step 1 Register kernels ~~pure emerald~~ CPU for working in parallel computing via registerDoParallel (in our case, we first detect the number of cores in case)

Register CPU cores

 numCores <- detectCores() registerDoParallel(numCores)

Step 2 We start the cycle through foreach (do not forget to specify the % dopar% operator so that the cycle runs in parallel and indicate, through the .combine parameter , the way we will collect the result). In our case .combine = rbind , because at the output of each loop we will have a data frame .

Error table retrieval code

 #      VNX   ip   . #      ,      #   ,      dataframe #  %dopar%   . #    ,   system.time   , #    %dopar%  %do%.      4-5. # system.time({ errors_df <- foreach(i = 1:nrow(VNX_ip), .combine = rbind, .packages = "tidyverse") %dopar% { errors_raw <- system( paste( "NaviSECCli.exe -scope 0 -h", VNX_ip$cntIP[i], "-user myusername -password mypassword getlog -date", bigPeriodForm, currDateForm ), intern = TRUE ) %>% str_subset(pattern = regex(paste0(errorNumbers, collapse = "|"))) #      ,       if (length(errors_raw) > 0) { #     #       , #     , #       . errorsDescr <- errors_raw %>% gsub("(.*\\) )(.*)(\\s+\\[.*)", "\\2", x = .) %>% trimws() %>% gsub('([[:punct:]])|\\s+', '_', .) #           errors <- errors_raw %>% str_split(pattern = "\\s+", simplify = T) %>% as_tibble() %>% mutate(Disk = paste0(V4, "_", V6, "_", V8) %>% gsub( pattern = "\\([0-9]{3}\\)", replacement = "", x = .) ) #  dataframe    data_frame(cl = VNX_ip$cl[i], Storage = VNX_ip$Name[i], Date = errors$V1 %>% as.Date(format = "%m/%d/%Y"), Time = errors$V2, Disk = errors$Disk, Error = errorsDescr, eCode = errors$V8 %>% str_extract(paste0(errorNumbers, collapse = "|")) %>% str_extract("[0-9]+")) %>% mutate(DateTime = as.POSIXct(paste(Date, Time), format = "%Y-%m-%d %H:%M:%S")) } } # })

Step 3 We clear the created parallelism cluster through stopImplicitCluster ()

A little more detail on getting a readable table from raw error text

In textual form, the errors are as follows:

 head(errors_raw) [1] "07/13/2019 00:01:46 Bus 0 Enclosure 3 Disk 9(801) Soft SCSI Bus Error [0x00] 841d1080 10006 " [2] "07/13/2019 00:01:46 Bus 0 Enclosure 3 Disk 9(801) Soft SCSI Bus Error [0x00] 841e1a00 10006 " [3] "07/13/2019 00:01:46 Bus 0 Enclosure 3 Disk 9(801) Soft SCSI Bus Error [0x00] 8420b600 10006 " [4] "07/13/2019 00:01:46 Bus 0 Enclosure 3 Disk 9(801) Soft SCSI Bus Error [0x00] 84206900 10006 " [5] "07/13/2019 00:01:46 Bus 0 Enclosure 3 Disk 9(801) Soft SCSI Bus Error [0x00] 841fc900 10006 " [6] "07/13/2019 00:01:46 Bus 0 Enclosure 3 Disk 9(801) Soft SCSI Bus Error [0x00] 841fc000 10006

Here we have the values separated by a space, which, at first glance, even in csv are normally inserted. But it’s not so simple. The complexity of parsing here is that:

date and time are also separated by a space (the smallest of evils);
the error text consists of "words", i.e. also separated by a space;
for some reason there is no space between the disk number and the error code (which is in brackets).
In general, a paradise for a lover of regular expressions :)

Sick bastard

I won’t dwell on parsing, because it’s a matter of taste, but I’ll clarify that the error text had to be torn apart, as the values located between the closing parenthesis of the error number and the opening square bracket of some other value. In a loop, this is the errors variable.

It is also an interesting point that, for the convenience of forming the final data frame, we, wanting to loop through the ip addresses of the controllers, set the sequence not through the column with the ip addresses of the controllers (i.e. i = VNX_ip $ cntIP ), but through the line number (i.e. e. i = 1: nrow (VNX_ip) ). This allows us to add the cluster number and storage name when forming the data frame with already parsed errors through calls VNX_ip $ cl [i] and VNX_ip $ Name [i], respectively. Without this, joins would have to be made, which would be slower and worse read in the code.

In the end, we get a data frame (to be honest, then tibble , but the difference is beyond the scope of the article), which contains all the data we need. Those. on which storage system, on which disk, when what error occurred.

Final view Data frame

 > errors_df # A tibble: 2,705 x 8 cl Storage Date Time Disk Error eCode DateTime <chr> <chr> <date> <chr> <chr> <chr> <chr> <dttm> 1 XclNam~ XStorageN~ 2019-07-18 12:09:~ 0_1_3 Soft_SCSI_~ 801 2019-07-18 12:09:55 2 XclNam~ XStorageN~ 2019-07-18 15:09:~ 0_1_3 Soft_SCSI_~ 801 2019-07-18 15:09:56 3 XclNam~ XStorageN~ 2019-07-18 16:28:~ 0_1_3 Soft_SCSI_~ 801 2019-07-18 16:28:50 4 XclNam~ XStorageN~ 2019-07-19 06:36:~ 0_1_6 Soft_SCSI_~ 801 2019-07-19 06:36:39 5 XclNam~ XStorageN~ 2019-07-19 20:57:~ 0_1_6 Soft_Media~ 820 2019-07-19 20:57:35 6 XclNam~ XStorageN~ 2019-07-22 11:00:~ 0_2_~ Soft_SCSI_~ 801 2019-07-22 11:00:43 7 XclNam~ XStorageN~ 2019-07-22 11:00:~ 0_2_~ Soft_SCSI_~ 801 2019-07-22 11:00:44 8 XclNam~ XStorageN~ 2019-07-22 12:02:~ 0_2_~ Soft_SCSI_~ 801 2019-07-22 12:02:31 9 XclNam~ XStorageN~ 2019-07-23 23:29:~ 0_3_8 Soft_SCSI_~ 801 2019-07-23 23:29:49 10 XclNam~ XStorageN~ 2019-07-13 00:01:~ 0_3_9 Soft_SCSI_~ 801 2019-07-13 00:01:46 # ... with 2,695 more rows

The smartest thing is that the whole cycle of parallel polling of all storage systems takes not 30 minutes, but 30 seconds .

Thank God that this is not the case when 30 seconds is too fast.
joke

It should be clarified that the PowerShell code also collected the serial numbers of disks from all storage systems in a cycle, and at the time of rewriting the code on R, this data was redundant. So the runtime comparison is not entirely honest, but still impressive.

4-5. Processing and data conversion. Generating an xlsx document

Data conversion for xlsx documents was reduced to filtering the source table on the last 3 days, as well as on the last month and converting the columns with the error names to the "horizontal" format, so that each type of error was in a separate column. A separate function was written for this (so as not to duplicate the same steps 2 times)

Source Filtering Function

 myErrorStats <- function(data, period, orderColname = quo(Soft_Media_Error)) { data %>% filter(Date > period) %>% group_by(cl, Storage, Disk, Error) %>% summarise(count = n()) %>% spread(Error, count, fill = 0) %>% arrange(desc(!!orderColname)) }

To display the types of errors in a separate column, the spread function was applied with the additional key fill = 0 , with which the missing values were filled with 0 . Without this key, if on some day there weren’t any type of error, the corresponding column would have NA values.

Also, in the function, I wanted to keep the ability to pass the column name for sorting as a variable, but at the same time have default values for this variable. For this, the peculiar dplyr syntax is used , about which you can read more here .

In our case, when defining the parameters of a function, we set one of them to the default value and quote it ( orderColname = quo (Soft_Media_Error) ), and then, when called, put characters in front of it !! to get arrange (desc (!! orderColname)) .

The appearance of the table with errors for the month

 > errorsBigPeriod # A tibble: 77 x 7 cl Storage Disk Hard_SCSI_Bus_E~ Recommend_Disk_~ Soft_Media_Error <chr> <chr> <chr> <dbl> <dbl> <dbl> 1 XclN~ XStora~ 1_1_~ 0 1 64 2 XclN~ XStora~ 0_2_5 0 0 29 3 XclN~ XStora~ 1_1_~ 0 1 29 4 XclN~ XStora~ 0_3_2 0 0 27 5 XclN~ XStora~ 0_3_~ 1 0 25 6 XclN~ XStora~ 1_3_5 0 1 23 7 XclN~ XStora~ 0_2_9 0 0 21 8 XclN~ XStora~ 0_3_4 0 0 14 9 XclN~ XStora~ 0_1_~ 0 0 14 10 XclN~ XStora~ 1_0_1 0 0 12 # ... with 67 more rows, and 1 more variable: Soft_SCSI_Bus_Error <dbl>

I analyzed the formation of the xlsx document in the article on reports on the state of the VM , so I will not dwell in detail. All code is given at the end of the article.

Here are important features that increase the readability of the report:

Signed tabs (by default the most interesting is open);
Highlighted column names
Autoformatting all columns so that all text is readable without the need to expand the columns.

6. create a schedule that we save in png

On the graph, I wanted to get the total number of errors per day for all storage systems by type. As a drawing tool, it was decided to use the standard ggplot2 library.

The first version of the graph showed all errors on one graph and looked like this:

Colleagues said that it turned out unreadable.
~~What would they understand? !!!!~~
The remarks were taken into account and the facet_grid function was added to the standard columns ( geom_bar ) to divide the result into separate graphs according to error types.
The final result suited everyone.

Summary schedule

Data preparation, graphing, saving to file

 ####   #### #       errorsTotal <- errors_df %>% group_by(Date, Error) %>% summarise(count = n()) %>% # spread(Error, count, fill = 0) %>% arrange(desc(Date)) #        . #       ,   . plot <- errorsTotal %>% ggplot(aes(x = Date, y = count, fill = Error)) + geom_bar(stat = "identity", width = 0.5, color = "grey") + theme_minimal() + # theme(legend.position="top") + scale_color_grey() + labs(title = "  EMC VNX", subtitle = "        ", fill = " ") + xlab("") + ylab("   ") + scale_fill_brewer(palette = "Spectral") + facet_grid(rows = vars(factor( Error, levels = c( "Soft_SCSI_Bus_Error", "Soft_Media_Error", "Hard_SCSI_Bus_Error", "Recommend_Disk_Replacement" ) ))) # #   # plot #   png,     plot_filePath <- file.path("results", "plot.png") #    png     ggsave(filename = plot_filePath, plot = plot)

From the interesting in the formation of the schedule.

I wanted the charts to be in a certain order. For this, the parameter of the formation of series in facet_grid had to be transferred as a factor, or rather even an ordered factor . Factor is such a cunning data format in R, which is a set of values (in our case, strings, i.e. character- s), and the set of these values is strictly defined (called factor levels), and even these levels are sorted. It sounds complicated, but everything falls into place if you say that the names of the months are a great example of an ordered factor. Those. we know what names months can have, and we also know (well, I hope) that first comes January, then February, then March, etc. It is on the same principle that we create a factor.

7-8. We form a letter containing the collected data. Send a letter

The formation and sending of letters, as well as the formation of tasks in Windows scheduller was also considered in the article on reports on the state of the VM . We just put a few variables into the text and format it more or less clearly. Do not forget the attachment.

The final form of the letter

findings

R once again proved to be a universal tool for performing everyday tasks and visualizing their results. And with parallel computing enabled, this tool also becomes fast.
Practice has also shown that PowerShell shows itself extremely slowly on tasks of parsing logs and translating them into a readable format.
Many thanks to everyone who has read so many letters to the end.

Full application code

Full R application code

 #### ENV #### #         (    system32) setwd("C:\\Scripts\\VNX_disks_check/") #   library(tidyverse) library(lubridate) library(zoo) library(stringi) library(xlsx) library(mailR) library(foreach) library(doParallel) #### CONST #### #            vnxIPfilePath <- file.path("data", "VNX-IP.csv") #     bigPeriod <- Sys.Date() - 30 #     smallPeriod <- Sys.Date() - 3 #    productCls <- c("name1", "name2", "name3") #   IP VNX. #         , #   ip        . VNX_ip <- vnxIPfilePath %>% read_csv2( col_types = cols( Name = col_character(), IP = col_character(), cl = col_character(), type = col_character(), newA = col_character(), newB = col_character(), oldA = col_character(), oldB = col_character() ) ) %>% filter(cl %in% productCls) %>% gather(key = "cntName", value = "cntIP", 5:6) ####    VNX #### #         (      ) # NaviSECCli.exe -scope 0 -h 10.201.16.15 -user root -password Secrt4yo getlog -date 07/16/2019 07/17/2019 #   ,   . #      ,       errorNumbers <- c("\\(820\\)", "\\(803\\)", "\\(801\\)", "\\(920\\)", "\\(901\\)") ##   ## #         numCores <- detectCores() registerDoParallel(numCores) #    ,   NaviCLI #  1    ,     "" , # ..   ,      ,     . bigPeriodForm <- bigPeriod %>% format(format = "%m/%d/%Y") currDateForm <- (Sys.Date() + 1) %>% format(format = "%m/%d/%Y") #    . #      VNX   ip   . #      ,      #   ,      dataframe #  %dopar%   . #    ,   system.time   , #    %dopar%  %do%.      4-5. # system.time({ errors_df <- foreach(i = 1:nrow(VNX_ip), .combine = rbind, .packages = "tidyverse") %dopar% { errors_raw <- system( paste( "NaviSECCli.exe -scope 0 -h", VNX_ip$cntIP[i], "-user myusername -password mypassword getlog -date", bigPeriodForm, currDateForm ), intern = TRUE ) %>% str_subset(pattern = regex(paste0(errorNumbers, collapse = "|"))) #      ,       if (length(errors_raw) > 0) { #       , #     , #       . errorsDescr <- errors_raw %>% gsub("(.*\\) )(.*)(\\s+\\[.*)", "\\2", x = .) %>% trimws() %>% gsub('([[:punct:]])|\\s+', '_', .) #           errors <- errors_raw %>% str_split(pattern = "\\s+", simplify = T) %>% as_tibble() %>% mutate(Disk = paste0(V4, "_", V6, "_", V8) %>% gsub( pattern = "\\([0-9]{3}\\)", replacement = "", x = .) ) #  dataframe    data_frame(cl = VNX_ip$cl[i], Storage = VNX_ip$Name[i], Date = errors$V1 %>% as.Date(format = "%m/%d/%Y"), Time = errors$V2, Disk = errors$Disk, Error = errorsDescr, eCode = errors$V8 %>% str_extract(paste0(errorNumbers, collapse = "|")) %>% str_extract("[0-9]+")) %>% mutate(DateTime = as.POSIXct(paste(Date, Time), format = "%Y-%m-%d %H:%M:%S")) } } # }) #     .      ,    . stopImplicitCluster() ####  #### #    .    ,     .  . # https://dplyr.tidyverse.org/articles/programming.html myErrorStats <- function(data, period, orderColname = quo(Soft_Media_Error)) { data %>% filter(Date > period) %>% group_by(cl, Storage, Disk, Error) %>% summarise(count = n()) %>% spread(Error, count, fill = 0) %>% arrange(desc(!!orderColname)) } #           .  . errorsBigPeriod <- errors_df %>% myErrorStats(bigPeriod) #             errorsSmallPeriod <- errors_df %>% myErrorStats(smallPeriod) #   ,     errors_filePath <- file.path("results", "VNX_Errors.xlsx") ####  xlsx  #### #    wb<-createWorkbook(type="xlsx") #         TABLE_ROWNAMES_STYLE <- CellStyle(wb) + Font(wb, isBold=TRUE) TABLE_COLNAMES_STYLE <- CellStyle(wb) + Font(wb, isBold=TRUE) + Alignment(wrapText=TRUE, horizontal="ALIGN_CENTER") + Border(color="black", position=c("TOP", "BOTTOM"), pen=c("BORDER_THIN", "BORDER_THICK")) #  t s sheetSmall <- createSheet(wb, sheetName = " 3 ") sheetBig <- createSheet(wb, sheetName = " ") sheetRaw <- createSheet(wb, sheetName = " ") ##   addDataFrame( errorsSmallPeriod %>% as.data.frame(), sheetSmall, startRow = 1, startColumn = 1, row.names = FALSE, byrow = FALSE, colnamesStyle = TABLE_COLNAMES_STYLE, rownamesStyle = TABLE_ROWNAMES_STYLE ) addDataFrame( errorsBigPeriod %>% as.data.frame(), sheetBig, startRow = 1, startColumn = 1, row.names = FALSE, byrow = FALSE, colnamesStyle = TABLE_COLNAMES_STYLE, rownamesStyle = TABLE_ROWNAMES_STYLE ) #          DateTime     addDataFrame( errors_df %>% as.data.frame() %>% arrange(desc(DateTime)), sheetRaw, startRow = 1, startColumn = 1, row.names = FALSE, byrow = FALSE, colnamesStyle = TABLE_COLNAMES_STYLE, rownamesStyle = TABLE_ROWNAMES_STYLE ) #  ,     autoSizeColumn(sheet = sheetSmall, colIndex=c(1:ncol(errorsSmallPeriod))) autoSizeColumn(sheet = sheetBig, colIndex=c(1:ncol(errorsBigPeriod))) autoSizeColumn(sheet = sheetRaw, colIndex=c(1:ncol(errors_df))) #     .   - . if (file.exists(errors_filePath)) {file.remove(errors_filePath)} #  xlsx  saveWorkbook(wb, errors_filePath) ####   #### #       errorsTotal <- errors_df %>% group_by(Date, Error) %>% summarise(count = n()) %>% # spread(Error, count, fill = 0) %>% arrange(desc(Date)) #         plot <- errorsTotal %>% ggplot(aes(x = Date, y = count, fill = Error)) + geom_bar(stat = "identity", width = 0.5, color = "grey") + theme_minimal() + # theme(legend.position="top") + scale_color_grey() + labs(title = "  EMC VNX", subtitle = "        ", fill = " ") + xlab("") + ylab("   ") + scale_fill_brewer(palette = "Spectral") + facet_grid(rows = vars(factor( Error, levels = c( "Soft_SCSI_Bus_Error", "Soft_Media_Error", "Hard_SCSI_Bus_Error", "Recommend_Disk_Replacement" ) ))) # #   # plot #   png,     plot_filePath <- file.path("results", "plot.png") #    png     ggsave(filename = plot_filePath, plot = plot) ####    #### #    emailRecepientsList <- c("sendall-tech@domain.ru") #      emailParams <- list( from = "login@domain.ru", to = emailRecepientsList, smtpParams = list( host.name = "10.10.10.1", port = 25, user.name = "login@domain.ru", passwd = "mypassword", ssl = FALSE ) ) #     (    ). #     ,          ,     . errorsTotal <- errorsSmallPeriod[-c(1,2,3)] %>% sum() #    emailBody <- paste0( '<html> <h3> ,  .</h3> <p>  3    <strong>', errorsTotal, '</strong>    EMC VNX</p> <p>       ,      .</p> <p>  3 : <ul> <li>   3 .   <strong>Soft_Media_Error</strong>.</li> <li>  30 .   <strong>Soft_Media_Error</strong>.</li> <li> .   <strong></strong>.</li> </ul>          .   .</p> <p><img src="', plot_filePath, '"></p> </html>' ) ####      #### send.mail(from = emailParams$from, to = emailParams$to, subject = "     EMC VNX", body = emailBody, encoding = "utf-8", html = TRUE, inline = TRUE, smtp = emailParams$smtpParams, authenticate = TRUE, send = TRUE, attach.files = c(errors_filePath), debug = FALSE)

: EMC VNX 5300
: NaviCLI-Win-32-x86-en_US-7.31.25.1.29-1
, : 4*2 CPU, 8 Gb RAM

 > sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 R2 x64 (build 9600) Matrix products: default locale: [1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251 LC_MONETARY=Russian_Russia.1251 [4] LC_NUMERIC=C LC_TIME=Russian_Russia.1251 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] taskscheduleR_1.4 pander_0.6.3 doParallel_1.0.14 iterators_1.0.10 foreach_1.4.4 mailR_0.4.1 [7] xlsx_0.6.1 stringi_1.4.3 zoo_1.8-6 lubridate_1.7.4 wesanderson_0.3.6 forcats_0.4.0 [13] stringr_1.4.0 dplyr_0.8.3 purrr_0.3.2 readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 [19] ggplot2_3.2.0 tidyverse_1.2.1 loaded via a namespace (and not attached): [1] tidyselect_0.2.5 reshape2_1.4.3 rJava_0.9-11 haven_2.1.1 lattice_0.20-38 colorspace_1.4-1 [7] vctrs_0.2.0 generics_0.0.2 utf8_1.1.4 rlang_0.4.0 R.oo_1.22.0 pillar_1.4.2 [13] glue_1.3.1 withr_2.1.2 R.utils_2.9.0 RColorBrewer_1.1-2 modelr_0.1.4 readxl_1.3.1 [19] plyr_1.8.4 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_0.3.4 R.methodsS3_1.7.1 [25] codetools_0.2-16 labeling_0.3 fansi_0.4.0 xlsxjars_0.6.1 broom_0.5.2 Rcpp_1.0.1 [31] scales_1.0.0 backports_1.1.4 jsonlite_1.6 digest_0.6.20 hms_0.5.0 grid_3.5.3 [37] cli_1.1.0 tools_3.5.3 magrittr_1.5 lazyeval_0.2.2 crayon_1.3.4 pkgconfig_2.0.2 [43] zeallot_0.1.0 data.table_1.12.2 xml2_1.2.0 assertthat_0.2.1 httr_1.4.0 rstudioapi_0.10 [49] R6_2.4.0 nlme_3.1-137 compiler_3.5.3

Source: https://habr.com/ru/post/461441/

All Articles