Quantcast
Channel: #cloud blog
Viewing all 259 articles
Browse latest View live

Oracle ASM in Azure

$
0
0


Now we're doing more stuff in the cloud I've had to dust off my old sysadmin hat and start doing more of the stuff I'd forgotten about from when i was a sysadmin (10+ years ago). This week I've been setting up ASM instances in Azure so i had to do all the work around making the disk devices available to ASM.

On premises I'd just pick up the phone and say create devices for me - now i actually have to work out how to do it.

So here is my quick guide to setting up ASM devices in Azure (most of this will be the same for other clouds or on premises). In this basic example I'm just going to add 2 disks of 1TB each, one to use for DATA disk group and one to use for FRA disk group.

So assuming you already have a basic server up and running now you need to add 2 disks to it - to do this go to the azure portal, click on the disks icon and then add two disks making sure to click the save button or nothing will actually happen ( i made this mistake so that's why i specifically mention it.....)

The azure screen will then look like this (showing 3 additional disks, 1 was for /oracle filesystem which i'd already dealt with the previous day and the other two will be used for the ASM example below)



Now the disks are saved they are provisioned to the server and you should be able to see them by default without having to do anything else (at least it worked this way on redhat 7.2 for me). However you may find on other or older os's you may need to scan the scsi bus for the operating system to discover them. The command to do this is rescan-scsi-bus.sh - this is within the sg3_utils package (at least for redhat) if that package is not already installed.

This command will basically just probe for any new/changed devices that are attached to the system.

To check what has been discovered you can look in a couple of places - you can either say

[root]# dmesg |grep SCSI
[    0.211099] SCSI subsystem initialized
[    1.020215] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    2.892767] sd 5:0:1:0: [sdb] Attached SCSI disk
[    2.893235] sd 2:0:0:0: [sda] Attached SCSI disk
[  363.573682] sd 4:0:0:0: [sdc] Attached SCSI disk
[87200.690290] sd 4:0:0:1: [sdd] Attached SCSI disk
[87201.360334] sd 4:0:0:2: [sde] Attached SCSI disk

or

[root]# grep SCSI /var/log/messages
Feb 21 09:17:43 localhost kernel: SCSI subsystem initialized
Feb 21 09:17:43 localhost kernel: Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
Feb 21 09:17:43 localhost kernel: sd 5:0:1:0: [sdb] Attached SCSI disk
Feb 21 09:17:43 localhost kernel: sd 2:0:0:0: [sda] Attached SCSI disk
Feb 21 09:17:49 localhost smartd[521]: Monitoring 0 ATA and 0 SCSI devices
Feb 21 09:23:41 localhost kernel: sd 4:0:0:0: [sdc] Attached SCSI disk
Feb 22 09:30:58 localhost kernel: sd 4:0:0:1: [sdd] Attached SCSI disk
Feb 22 09:30:59 localhost kernel: sd 4:0:0:2: [sde] Attached SCSI disk

The messages file is perhaps clearer as it shows the timestamp the 2 disks were discovered and gives you more confidence that you have the right ones.

So in the case above add and see were the discovered disks and you'll see /dev/sdd and /dev/sde have been created.

Another useful command at this point is lsblk - this gives some more detail about the disks and their current use - so for my server if i run that i see

[root]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0      2:0    1    4K  0 disk
sda      8:0    0 29.3G  0 disk
├─sda1   8:1    0  500M  0 part /boot
└─sda2   8:2    0 28.8G  0 part /
sdb      8:16   0   14G  0 disk
└─sdb1   8:17   0   14G  0 part /mnt/resource
sdc      8:32   0  100G  0 disk
└─sdc1   8:33   0  100G  0 part /oracle
sdd      8:48   0 1023G  0 disk
sde      8:64   0 1023G  0 disk
sr0     11:0    1  1.1M  0 rom

We can see from this that sda contains / and /boot, sdb has /mnt/resource, sdc has /oracle (which was another disk i added previously) and sdd/sde currently are not partitioned and have nothing on them (at least from a filesystem point of view).

Another point thats worth mentioning here is that redhat has only discovered 1 path to the disk - there is no multipathing on this server - as an example of something where we have multipathing this is what the output would look like there

[root@multipath ~]# lsblk
NAME                        MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                           8:0    0  100G  0 disk
└─mpatha (dm-0)             253:0    0  100G  0 mpath
  ├─mpathap1 (dm-2)         253:2    0  512M  0 part  /boot
  ├─mpathap2 (dm-4)         253:4    0 34.1G  0 part
  │ └─vg00-swapvol (dm-9)   253:9    0    4G  0 lvm   [SWAP]
  └─mpathap3 (dm-6)         253:6    0 65.4G  0 part
    ├─vg00-rootvol (dm-8)   253:8    0    5G  0 lvm   /
    ├─vg00-swapvol (dm-9)   253:9    0    4G  0 lvm   [SWAP]
    ├─vg00-varvol (dm-10)   253:10   0    4G  0 lvm   /var
    ├─vg00-crashvol (dm-11) 253:11   0 29.3G  0 lvm   /var/crash
    ├─vg00-auditvol (dm-12) 253:12   0  256M  0 lvm   /var/log/audit
    ├─vg00-tmpvol (dm-13)   253:13   0    2G  0 lvm   /tmp
    ├─vg00-homevol (dm-14)  253:14   0    1G  0 lvm   /home
    └─vg00-lvoracle (dm-15) 253:15   0   20G  0 lvm   /oracle
sdd                           8:48   0  100G  0 disk
└─mpatha (dm-0)             253:0    0  100G  0 mpath
  ├─mpathap1 (dm-2)         253:2    0  512M  0 part  /boot
  ├─mpathap2 (dm-4)         253:4    0 34.1G  0 part
  │ └─vg00-swapvol (dm-9)   253:9    0    4G  0 lvm   [SWAP]
  └─mpathap3 (dm-6)         253:6    0 65.4G  0 part
    ├─vg00-rootvol (dm-8)   253:8    0    5G  0 lvm   /
    ├─vg00-swapvol (dm-9)   253:9    0    4G  0 lvm   [SWAP]
    ├─vg00-varvol (dm-10)   253:10   0    4G  0 lvm   /var
    ├─vg00-crashvol (dm-11) 253:11   0 29.3G  0 lvm   /var/crash
    ├─vg00-auditvol (dm-12) 253:12   0  256M  0 lvm   /var/log/audit
    ├─vg00-tmpvol (dm-13)   253:13   0    2G  0 lvm   /tmp
    ├─vg00-homevol (dm-14)  253:14   0    1G  0 lvm   /home
    └─vg00-lvoracle (dm-15) 253:15   0   20G  0 lvm   /oracle

You can see that the exact same thing is accessible via sda and sdd - this is where multipath comes in - it gives you a single device name that can access both paths - giving failover and improved performance (in the output above that name is mpatha) For this azure blog example I'm ignoring multipath as its not there on my Azure server (at least the small one i chose) but be aware if you do have multipath that ASM should be doing the access via the multipath device not and not the individual disk names that i will show below.

OK now we know the disk names we now have the option to fdisk them to partition up the disks - this isn't strictly necessary as we want to use the whole disk anyway but it seems to be good practice as it at least formats the disk header and makes linux (and actually people) aware that this disk is being used for something.

To do the fdisk i just run

fdisk /dev/sdd

Then go through the menu options choosing n for new and accepting all the defaults - when it stops prompting you for more answers it's ready to write the results back to the disk which you do by typing w.

After you;ve done that for both disks you'll see the output in lsblk is slightly different

[root@]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0      2:0    1    4K  0 disk
sda      8:0    0 29.3G  0 disk
├─sda1   8:1    0  500M  0 part /boot
└─sda2   8:2    0 28.8G  0 part /
sdb      8:16   0   14G  0 disk
└─sdb1   8:17   0   14G  0 part /mnt/resource
sdc      8:32   0  100G  0 disk
└─sdc1   8:33   0  100G  0 part /oracle
sdd      8:48   0 1023G  0 disk
└─sdd1   8:49   0 1023G  0 part
sde      8:64   0 1023G  0 disk
└─sde1   8:65   0 1023G  0 part
sr0     11:0    1  1.1M  0 rom

We have 2 new lines reflecting the fact the disk is partitioned, if you look in /dev you'll also see new devices called /dev/sdd1 and /dev/sde1 (or whatever your device names were). If you create more than one partition these will show up as /dev/sdd2, /dev/sdd3 etc - you get the idea.

OK great we're now good to go right?

Well you could do but you have 2 main issues:

1) Permissions won't allow oracle to use the devices
2) The device names can change - for example if there is an issue talking to /dev/sdb this device name might be used for something else or the device order could get randomly shuffled - this is no good - we need persistent names here.

There are 2 main solutions to this

1) ASMLib
2) udev

Now ASMLib was meant to be oracle's solution to this but i never really saw the point of this extra layer of complexity (and something else to go wrong), in fact at one point in the past it was pulled but then came back. It's still there as an option if you want to use it but i struggle to see the benefit of it really when you can just do it with the built in tool udev.

So ignoring ASMLib here is how you do it with udev

Now the key bit of information required by udev to make all this work and to guarantee things end up with the same name all the time is the WWN name of the disk - this is essentially like the MAC address of a network card - it is a globally unique value that can only be assigned to one thing. This means we can build a set of rules based on this fact to give us a consistent device name.

So how do we find this WWN?

Well couple of ways - either this which shows the WWN name for each disk at the end of the line

[root@]# lsscsi -i
[1:0:0:0]    cd/dvd  Msft     Virtual CD/ROM   1.0   /dev/sr0   -
[2:0:0:0]    disk    Msft     Virtual Disk     1.0   /dev/sda   3600224805f644b4df8f61359aa823ccd
[4:0:0:0]    disk    Msft     Virtual Disk     1.0   /dev/sdc   36002248061b57d183e2b7e3923d5bd5c
[4:0:0:1]    disk    Msft     Virtual Disk     1.0   /dev/sdd   360022480a89ae89fa85a9e69fd66934f
[4:0:0:2]    disk    Msft     Virtual Disk     1.0   /dev/sde   3600224809d6a88fc75be22809c742190
[5:0:1:0]    disk    Msft     Virtual Disk     1.0   /dev/sdb   36002248064f6b661ed183884d69ac5db

Or for a specific disk this command (and this is actually what udev itself makes use of) - this command varies between versions so check you have the right one

[root@]#  /usr/lib/udev/scsi_id -g -u -d /dev/sde1
3600224809d6a88fc75be22809c742190

OK - so now we have the WWN names how do we make use of udev to give us some disk names ?

We need to add an additional 'rules' file which is run on startup that will discover the disks and create some stuff for us based on the configuration we create. So lets create that file - i name it starting with 99 so its the last rule that gets processed.

vi /etc/udev/rules.d/99-oracle-asmdevices.rules

Inside this file i add the following 2 lines of configuration

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”360022480a89ae89fa85a9e69fd66934f″, SYMLINK+=”asm-data1″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”3600224809d6a88fc75be22809c742190″, SYMLINK+=”asm-fra1″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

So lets examine that a little more - whats it actually doing? If we look at the first line its doing the following:

1) Look at the devices the kernel has discovered under /dev/sd?1 (note the rule is only looking for partition 1 and nothing else)
2) using the command scsi_id ...... probe it and get the WWN name
3) if the WWN matches the value i have then do the following:
  • create a symbolic link name called asm-data1 (under /dev)
  • set the owner to oracle
  • set the group to dba
  • set the permissions to 0660
To check you configured the rules file correctly you can run


udevadm test /block/sde/sde1

Note use there of the /block prefix rather than /dev - this will give lots of nice output telling you what its doing. If you just want to force it to reread all the rules and set everything up then run

udevadm control --reload-rules

After that is complete you'll see the following devices created under /dev

So we have these symlinks

[root@dev]# ls -l oracle*
lrwxrwxrwx. 1 root root 4 Feb 22 13:20 oracleasm-data1 -> sdd1
lrwxrwxrwx. 1 root root 4 Feb 22 13:18 oracleasm-fra1 -> sde1

which point at these devices - which we can see now have the right permissions.

[root@dev]# ls -l sdd1 sde1
brw-rw----. 1 oracle dba 8, 49 Feb 22 13:20 sdd1
brw-rw----. 1 oracle dba 8, 65 Feb 22 13:20 sde1

So looking good - now we just have to set up ASM - i'll do this using the asmca in silent mode with the following command string pointing at the symlink names (I'e got the GI installed via a clone command as detailed here 

asmca -silent -configureASM -sysAsmPassword password here -asmsnmpPassword password here -diskString '/dev/oracleasm*' -diskGroupName DATA -disk '/dev/oracleasm-data1' -redundancy EXTERNAL -diskGroupName FRA -disk '/dev/oracleasm-fra1' -redundancy EXTERNAL

So in this case i'm explicitly mentioning the path for DATA to the first symlink and FRA to the second symlink - the disk discovery path for new devices is anything under /dev/oracleasm*

And there we have it - how to add asm devices with consistent name to a redhat machine in Azure and then get ASM to use them.




Oracle ASM in Azure with premium (SSD) disks - warning

$
0
0


So after following my previous notes on installing ASM into Azure, i repeated everything but using premium (SSD) disks and all appeared to be well - until that is i did a data pump load into it and put some stress on the system - then (and this was consistently every time) this happened....

ORA-00354: corrupt redo log block header 
ORA-00353: log corruption near block 196020 change 378858 time 02/23/2017 23:54:17 
ORA-00312: online log 2 thread 1: '+FRA/AZAPA/ONLINELOG/group_2.257.936744263' 
ORA-00312: online log 2 thread 1: '+DATA/AZAPA/ONLINELOG/group_2.259.936744261' 



Nice eh - I don't think i've ever seen that before in 20 years of doing this.....

So what is going on - a Redhat bug? An Azure bug? An Oracle bug even?

Well i started off by raising a ticket with all three to see where it went and was worried that this was going to turn into a finger pointing exercise.

Well surprisingly (after a slow start) Oracle came up with the solution and some interesting information at the same time.

First up was the interesting info



Quite a surprise that one!

So then i set about provisioning a new OEL server just to make sure i was on a fully 'certified/supported' platform - whatever the subtle difference in that is I'm not sure.

After re-running the test case the same corruption happened so seems its not a redhat issue.

So now on to the next bit of information from oracle - which surprising was hard to find without the direct links that were posted in the SR.

The problem was actually caused by the way data is written to SSD's - it doesn't use the same block size as traditional magnetic disks (512 bytes) - everything it does is 4096 bytes. For most Oracle files this isn't an issue as they are writing more than 512 bytes anyway (datafiles,controlfiles etc) - however for one very important process (LGWR) this is an issue and is the thing causing the corruption of the redo logs i was seeing.

OK - so now we know the cause what's the fix?

Well couple of things as it seems that ASM is not auto detecting the fact that these SSD's want to work with 4K - the first step is to make sure the devices are discovered with the right sector size (4K) - to do this we run the create disk group command with an extra parameter - however by default it won't accept it - see below


SQL>  create diskgroup data external redundancy disk '/dev/oracleasm-fra1' attribute 'sector_size'='4096';
 create diskgroup data external redundancy disk '/dev/oracleasm-fra1' attribute 'sector_size'='4096'
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created

ORA-15038: disk '/dev/oracleasm-fra1' mismatch on 'Sector Size' with target disk group [512] [4096]

So - it's still refusing to use 4K - not sure if this is just a feature of the disks in Azure or if this is a general problem with SSD - anyway the way to force ASM to work in 4K is like this:

SQL> alter system set "_disk_sector_size_override"=TRUE scope=spfile
  2  /

System altered.

So now it's using the correct 4K size - so the ASM disk groups are now fine.

However there is one more step we have to be mindful of when adding the redo logs - and here it is:

SQL> create database character set utf8 logfile group 1 size 1G blocksize 4096,group 2 size 1G blocksize 4096;

Database created.

Now i did that in the create database statement (which is probably wise unless you are switching to SSD later on) - the key bit is the optional block size 4096 clause at the end (default 512) - now the redo logs will be written in 4K rather than 512 bytes.

So after those changes i re-ran the import and am pleased to say all looks good - no sign of the corruption.

Now i just want to do some performance testing of the various asm/non-asm options to see which gives us the best performance. I would hope ASM should be faster than any other option

Connection manager - why does this seem to be a secret feature?

$
0
0


While looking into an issue with not having routing available to a certain network one of my colleagues suggested connection manager - something he just found when googling the problem. Now I'd heard of this from many years ago ( i thought with reference to trying to communicate with some other strange network type) but i knew basically nothing about it.

I was initially just going to suggest how we got round this sort of thing before like this http://dbaharrison.blogspot.de/2016/02/sneaking-past-firewalls-to-database.html

but then i thought i'd check what connection manager was all about (as it was also mentioned to me when i presented the technique above at a user group meeting - which i then promptly forgot about 5 minutes later until this triggered the memory again)

I then did a quick bit of googling myself and found a couple of articles -

This one by Arup http://arup.blogspot.de/2011/08/setting-up-oracle-connection-manager.html

and this one by Martin https://martincarstenbach.wordpress.com/2011/07/11/using-connection-manager-to-protect-a-database/

Both pretty old - but both useful - what confused me though is that they both seem to have fixed the same issue but used connection manager in quite different ways. Arup's doesn't need any changes to the database you want to connect to and connection manager just seems to act as a middle man just handing the connection over to the next connection in the chain. In Martin's case the remote database actually register something back with the connection manager (similar to how normal dynamic listener registration works).

Choosing the laziest option i went with how Arup had done it - so here is how i implemented my version of that

Here is the infra i had (nice drawing skills eh?)



In this case my client is in our corporate network, the hop server is in a network in azure that is 'privately peered' to our corporate network - i.e. it appears as just an internal address to us and is accessed over a private express route connection. The final destination database is also in Azure but is in a network that is not directly peered so a direct client to database connection is not possible.

So how did i go about this then.

Well the first issue was actually tracking down the connection manager software.....

It's not included with a full EE install surprisingly, it's also not included with a full admin client install - so where the hell is it? Well turns out you get it from a custom install of the client where you specifically choose it - which kind of hints to me that this is not going to be widely used.......

Anyway once i tracked it down i installed this on the 'hop' server in the picture above - using the 12.2 client version (again being lazy - this was the easiest to find on the website).

Once the client was installed i had a look and sure enough the utility cmctl is there and we're ready to go.

In the $ORACLE_HOME/network/admin file i created the following config file cman.ora

CMAN_thishostname =
  (configuration=
    (address=
      (protocol=tcp)(host=thishostname)(port=1555)
    )
    (rule_list=
      (rule=(src=thishostname)(dst=127.0.0.1)(srv=cmon)(act=accept))
      (rule=(src=*)(dst=destdbhostname)(srv=*)(act=accept))
    )
  )

This is largely based on the 2 blogs above - the first bit of config says i want connection manager on port 1555 (port picked at random - has no special meaning), the second part adds some rules - the first is required for the thing to just work (well i didn't try without this just went with what Martin said) - the second line is saying from anywhere allow connection to destdbhost for any service - this is basically a firewall rule more than anything else.

Once that is saved i just run

cmctl startup (as i used default name for the file contents CMAN_thishostname)


CMCTL for Linux: Version 12.2.0.1.0 - Production on 12-MAR-2017 13:17:00

Copyright (c) 1996, 2016, Oracle.  All rights reserved.

Current instance CMAN_thishostname is not yet started
Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=thishostname)(port=1555)))
Starting Oracle Connection Manager instance CMAN_thishostname. Please wait...
CMAN for Linux: Version 12.2.0.1.0 - Production
Status of the Instance
----------------------
Instance name             cman_thishostname
Version                   CMAN for Linux: Version 12.2.0.1.0 - Production
Start date                12-MAR-2017 13:17:00
Uptime                    0 days 0 hr. 0 min. 9 sec
Num of gateways started   2
Average Load level        0
Log Level                 SUPPORT
Trace Level               OFF
Instance Config file      /oracle/12.2.0.1.client/network/admin/cman.ora
Instance Log directory    /oracle/diag/netcman/thishostname/cman_thishostname/alert
Instance Trace directory  /oracle/diag/netcman/thishostname/cman_thishostname/trace
The command completed successfully.


And all looks good - no errors

Now all i need is a special format tnsnames entry like the following on my client

DESTDB =
  (DESCRIPTION =
    (SOURCE_ROUTE = YES)
    (ADDRESS =
      (PROTOCOL = TCP)(HOST = hopservername)(PORT = 1555)
    )
    (ADDRESS =
      (PROTOCOL = TCP)(HOST = destdbname)(PORT = 1521)
    )
    (CONNECT_DATA =
      (SERVICE_NAME=DESTDBSERVICE)
    )
  )

Now i can just connect as normal to the remote database that i can;t reach directly - the clever part in the tnsnames is that SOURCE_ROUTE=YES - that seems to tell it to go to the first thing you find (connection manager) then from there attempt to connect to the second thing you find in the config - and indeed it does that and it works fine.

Reading the notes seems to imply that you could have multiple levels of this connection manager acting as a series of passes down the line until you get to where you want.

This seems like quite a neat solution to out problem - and does seem less of a 'hack' than ssh tunneling.

Of course if the connection manager (or its host) goes down the connection will be dropped (same as for ssh tunneling). - SO for example if i do this o the hop server

 cmctl shutdown abort

then my client connection does this


select sysdate from dual
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 17512
Session ID: 632 Serial number: 1466


In summary - this looks like quite a neat feature (and I've only scratched the surface) - don't really know why it's as hidden as it appears to be - maybe there just aren't too many use cases for it.

I'm sure there are other useful parts to it and I'm sure the method that Martin explains in his blog probably has some advantage over the basic one i used here - need to look more into it to find out what that advantage is.

Reverse tunneling for DBA's

$
0
0


After quite a long time I'm finally writing up a new post for your digestion. The reason for the long gap is that I'm going through something of a career change at the moment and I'm essentially walking away from DBA work to go forth as a cloud guy - more on that in a later post......

Anyway I thought I would share another useful connectivity technique (I'm not going to call it hacking although it does feel vaguely like that). This may or may not be useful for you but I think t's a useful technique to share as it could give you a temporary workaround in times of crisis......

So in our case this is the scenario - we have a nice lovely shiny new public cloud (lets call that Azure for arguments sake) and we have a big old mean datacentre with our 'legacy' stuff in (lets call that on prem - a phrase which still seems to annoy me inordinately).

In our world on prem can talk to Azure on a number of ports but nothing at all is allowed back directly - this completely kills any interfaces that want to work that way round.

Now in the future the firewall policy may be relaxed to allow specifics through but for the moment while we are still evaluating security around all of this it stays firmly shut.

I can illustrate this in the excellent diagram:



So we can't connect anything back then right......?

Well wrong actually if you know what you are doing with 'reverse ssh tunneling'.

Tell me more I hear you ask - well OK here we go.

So in a very simple case above lets assume that we have a Linux box on prem with a database on and a Linux box in Azure that wants to connect to this database - lets show how this reverse tunnelling can be set up.

So to initiaite the reverese tunnel I run this command from the on-prem box

ssh -R 4444:on-prem-server:1521 azure-server
In my case I have ssh keys set up to allow this to connect without a password (but that's not necessary if you just want to do a test - just be aware that as soon as the session dies the tunnel of course dies too). There are extra flags to create the tunnel as a background process but again I'm ignoring this for now.

If you have a putty session open when you run this you'll just end up with a session logged in to the remote server than you can just use as normal.

However what it's cleverly done in the background is create this reverese tunnel - lets dissect what the command options have actually done.

Well above you can see the mention of 2 separate ports and 2 separate servers - lets cover each one separately.

1)  ssh -R 4444:on-prem-server:1521 azure-server

Here port 4444 is the endpoint on the azure server that I will later be able to use to connect through - after the ssh command is established you will see port 4444 listed as listening in a netstat output

netstat -an |grep 4444
tcp        0      0 127.0.0.1:4444          0.0.0.0:*               LISTEN
tcp        0      0 ::1:4444                :::*                    LISTEN

(Note here that by default the ssh daemon will only start the port on loopback addresses - this is configurable though shoud you want to change that and have the tunnel accessible from other servers in Azure via this initial server)

2) ssh -R 4444:on-prem-server:1521 azure-server

Here on-prem-server is the endpoint that the tunnel will come back to - in this simple case its the same place that I initiated the ssh from - but it doesn't have to be - as long as the initiating server can get to the endpoint then that will work too)

3)  ssh -R 4444:on-prem-server:1521 azure-server

Here 1521 is the local endpoint that we want the traffic to come back to - so in this case we want the remote endpoint of 4444 to send traffic back to 1521 o the on-prem server - i.e. the listener of course

4) ssh -R 4444:on-prem-server:1521 azure-server

And finally azure-server is the server that we want connections to come back from - as mentioned earlier if this is the actual server that needs to come back then the tunnel will work fine - if it's some other Azure server then the ssh daemon config needs to be changed to allow ssh to act as a gateway - i.e. listen on a non localhost address

Ok so now that hopefully explained lets show how you can connect back through this tunnel

I'll demo it just using sqlplus with ezconnect syntax as its the simplest thing to use - so to establish a connection back over the reverse tunnel I just need to point my sqlpus connection at the new locally listening endpoint on the azure server over 4444 - so that just looks like this and is executed on the azure server


sqlplus user/password@//localhost:4444/DBNAME

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jun 29 10:41:12 2017
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Last Successful login time: Thu Jun 29 2017 10:41:03 +00:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL>
And there you have it - it connects just fine through port 4444 which gets reverse tunnelled back to 1521 on prem - very neat and also quite terrifying at the same time - I can hear firewall admins screaming from here.

I attempted to show the picture of that here - not quite as slick as the last one (the vague purple line is meant to show the connection coming back up the tunnel)

And there you have it - for some reason this whole technology reminds me of the parasitic fish that swims up your urine if you piss in a river- more on that here  https://www.animalplanet.com/tv-shows/river-monsters/videos/fan-favorite-fish-swims-up-urine-stream



Joining windows azure machines to Azure Active directory (AAD) using AADDS

$
0
0


So my foray into the cloud world continues and I've been getting more and more involved in the windows server aspects of this. Now i know if we were truly doing proper full on cloud we wouldn't be using traditional servers we'd be using all PaaS components and we would be truly 'serverless'. However reality means that this isn't the case and a lot of our migrations to the cloud will involve retaining traditional style IaaS machines just running in a 'different datacentre' - in this case Azure.

The fact that we still need traditional servers means we need traditional tools and processes to manage them to a large degree which in windows world means you basically have to have active directory in some form or other. Now managing a fully fledged AD install is not something i want to take on - there are justifiably roles in large organisations just doing AD - we don't have resource or skillset for that. 

So what are our options here?

Well relatively recently (not sure of exact dates here) Azure AD (the PaaS offering for AD) was extended to add Azure AD domain services (AADDS) - stick with me on the whole multiple acronyms thing.

These domain services add a number of features that make it much easier to manage machines in a more traditional way without the headache of full AD installation and support.

So having said all that how do i activate this and make use of it and join a machine to my domain?

Well assuming you have Azure AD set up and configured........

First step is to visit the old ASM portal (https://manage.windowsazure.com) - unfortunately this functionality didn't escape to the new ARM portal (https://portal.azure.com) yet. I'm sure that it will be coming pretty soon though as the old portal is being phase out more and more every day.

Here in the old portal i click on the enable option for domain services - see pic below.




After a while (maybe 30 mins or so) an initial server will appear followed by a 2nd one a few minutes later. These servers are domain controllers in the traditional sense - however you have no access to them at all via RDP - there are essentially a PaaS component. (you can do stuff indirectly and I'll talk about that more in a later post).

Once you have these available you can move to the next stage.

The next stage is just making sure you have an account in Azure AD that is a domain administrator - you can check and add this in the new portal (it's only the activation of domain services that is in the old part). Just make sure you have an account created that has the right shown below.


At this point as long as the ASM network can talk to the network where your servers that you want to domain join are located - either via peering or some other method you should be able to join the machines in the normal way.

So that's just a case of accessing the server manager screen - clicking on workgroup


Then typing in the domain name you chose for Azure AD so xxxxxx.onmicrosoft.com


Then enter the credentials for your domain admin login in the form DOMAIN\user or user@xxxxx.onmicrosoft.com (either syntax will work)


Then the join works


And just a reboot is required


Job done - i can now log on with my domain account to manage the server and make use of domain services such as dns and group policy - both of which i'll write up shortly to expand a little more on what can be done.

If you're moving to Azure and still have to manage traditional stuff these steps are pretty much essential to have any chance of managing the estate. With AADDS we get the benefits of AD without the complexity.

Sending emails via smtp relay from azure via office365

$
0
0


Not the snappiest of titles i know - and actually quite a dry post without really having any screenshots -but hey they can't all look great can they......

This is just a short summary of what i had to do to set up an smtp relay on a redhat linux machine running in the azure cloud - utilizing a mail account in office365. This would enable us to send emails directly from other machines in azure that could not easily directly integrate with office365 using 'modern' methods - at least not without some redevelopment effort - this setup process is another enabler in our short term goal of lift and shift type migrations versus long term goals of modernizing to use PaaS services.

So how did i go about this - here are the short series of steps required to get this working.

1) Install some extra software - postfix is the obvious mail tool - the other two are something that makes the bit with office365 work  can't say i know really exactly what they do (and i've not bothered to research it either....)

yum install postfix cyrus-sasl cyrus-sasl-md5

2) Update/add the following lines in /etc/postfix/main.cf the mail config file

relayhost = smtp.office365.com:587
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
sender_canonical_maps = regexp:/etc/postfix/canonical_maps_sender
inet_protocols = ipv4
smtp_sasl_auth_enable = yes
smtp_tls_security_level = may
smtp_sasl_security_options = noanonymous

The first line is obvious - it's just the smtp adress of office365 along with the port number, the next 2 lines reference additional files we'll create in a sec. The 4th lines just stops loads of annoying messages about ipv6. The last three are needed to make it work - don't ask me to expand on it any more than that.

3) Create the config file containing the credentials for office365

touch /etc/postfix/sasl_passwd
chown root: /etc/postfix/sasl_passwd
chmod 600 /etc/postfix/sasl_passwd

Once the file is there with the right permissions insert this content

smtp.office365.com yourusername@your.domain:yourpassword

Then get this content postmapped (not sure if this is necessary if you are restarting potfix- but i ran it anyway)

 postmap hash:/etc/postfix/sasl_passwd


4) Create the config file to remp the 'from' address to avoid the errors "550 5.7.60 SMTP; Client does not have permissions to send as this sender"


touch /etc/postfix/canonical_maps_sender

Then add the following entry - this remaps any from addresses postfix sees to the from address that is allowed by the account that is authenticating to office365.

/.*/ yourfromaddress@your.domain

then postmap this (again not 100% sure if this is needed or if a restart creates it anyway)

 postmap /etc/postfix/canonical_maps_sender

5) Then start the postfix daemon and set it to auto start on system startup

service postfix restart
chkconfig postfix on


And that's it - now if you configured everything right a simple test mail command like this should send you a mail just fine

echo test |mail -s this-is-test-subject anyemailaddress@any.domain

That will send an email from the office365 mailbox to whoever you want to send it to.

Pretty simple in the end to configure although i can tell i only scratched the surface of what postfix can do.......

All the clients have to do now to use the relay is point at the linux box on port 25 and away they go

Patching linux servers in Microsoft Azure

$
0
0


As part of our cloud migration we have to still manage a large number of traditional style machines (IaaS in new world terminology). These cover both windows and linux platforms - and actually in our case a mix of many different flavours of linux.

There are a number of ways to deal with patching - basically the exactly the same as you could do today on premise - the cloud doesn't change any of that. It does however add a new option by using some features of the OMS management platform (which can supposedly be made to work pointing back at on prem - so could make this an on prem management solution too).

In this post I'm just going to talk about the linux patching as the windows one will be slightly longer as i'm going to include some stuff on WSUS in that and I don't have time to write that up just yet.

So OMS (or Operational Management Suite to give it it's full name) is a 'product' that covers quite a lot of features and is touted by Microsoft to be the ideal tool to do most of your management. I'm not going to really talk about any of the other stuff i'm just going to go straight into the patching topic.

The first thing you need to do to make use of OMS is of course to enable OMS itself - i'm not going to cover that - it's just a case of finding the right screen and enabling it which i'm sure is possible for even the newest person to Azure.

Once the OMS portal is created for you - the next thing to do is enable the solution that deals specifically with patching - so navigate to the solution tile (shown below).


Once in the solutions screen you then need to choose this one

Once that's imported you'll have some new tiles but there will be no data being loaded. To enable the data to be gathered on the state of the patching (and to allow the patching to be done) you'll need to install the OMS agent. Strangely this is not available as an extension yet and you have to do it my other means - either manually or write your own scripts to add it

You can find the details of the manual process by going to the settings tile and then clicking through to the linux computer section - see pic below


So if you follow those steps you'll end up with the OMS agent uploading data into the OMS server which is then automatically reported on. Note that the OMS agent uploads to a public endpoint so your server needs to be able to reach that via normal internet routing or hairpinning or via some sort of web proxy - but the address has to be reachable.

After some period of time the data will be uploaded and if you navigate back to the updates tile you'll see something similar to this


Nice summary - and you can drill into this to see more details - you can also build your own custom tiles should you wish just based on OMS query language (which is kind of similar to SQL).

Now that's all well and good for reporting but what if we actually want to patch stuff - how do we go about that?

Well first up you need to enable it as it's a preview feature - so navigate to this screen and do that.


Once that is done we can schedule the updates from the tile in OMS




Then we click on the add button and fill in the details and click save


Then we can see it is scheduled



Then after some time we can see it completes and we can go and look into the details


Nice and easy - gives a nice central GUI to control all os patching rather than having multiple tools and a really nice reporting engine too.

However a few warnings

1) It may (depending on nature of patches) reboot the system - there is no warning of this....sample output below

Broadcast message from root@server(Mon 2017-07-10 08:11:14 UTC):

OMS Agent initiated shutdown after update.
The system is going down for reboot at Mon 2017-07-10 08:26:14 UTC!

2) there is no granularity of which patches to apply - it's all or nothing ( my guess is this will improve in the future).

3) When it doesn't work (i had a couple of problems initially just scheduling the patching) its hard to get a decent terror message out - one of the problem i had was a resource lock on the subscription level prevented a job from scheduling patching even though it was just a delete lock - very strange and hard to track down.

In summary it's quite a nice interface, though a little basic in places - i can see this will improve a lot over time to the point where it may truly be the only tool you need. For us for now we're using it but we're looking in to other tools to see how they compare.

The other thing to mention is the cost - this OMS 'pack' (or whatever the term is) is not cheap and is very overpriced i think for what you get currently (essentially the oms agent runs yum update - little more than that). This is something else Microsoft need to re-evaluate if they want to drive customers on to it.

Oracle ASM in Azure corruption - follow up

$
0
0


A few weeks ago i was contacted by the ASM product manager in a follow up to my post on the corruption issues i'd had with ASM in Azure. We had a conversation about the issues that I'd had and he told me that Oracle were going to be doing some more investigation into the issue with Microsoft as  it was believed that the issue really shouldn't have happened in the first place (even ignoring my workaround). A couple of weeks ago he gave me an update and I'm sharing that email below (Jim - i think it sounds better to include most of the email as it has some extra explanation info that people may fine interesting - i took out your email address to avoid you getting spammed by everyone :-))


Richard,

The last time we spoke, I indicated that Oracle would be doing independent testing of the corruption issue you first reported in your blog. Today, I would like to bring you up to date about what we found and also request an update to your blog to reflect our understanding of the problem.

The testing was a joint effort between Microsoft and Oracle. Microsoft's engineering team provided  the identification of the critical bug. The testing began by reproducing the corruption with much of the same configuration reported by you. We were readily able to reproduce the exact same corrupted data pattern. Our testing also revealed that certain Linux kernels experienced this problem while others did not have a problem. Microsoft seemed to have a pretty good idea of what the problem was and provided Oracle a patch. 

With the Microsoft supplied Linux kernel patch we could not recreate the corruption regardless of load placed on the database. The patch modifies memory block management handling in the kernel running in the VM associated with doing IO. The problem scenario is that when an Oracle database utilizing ASM runs in an Azure VM, Microsoft's paravirtualization driver (storvsc), running in the VM, interacts with the kernel IO buffer logic to cause wrong data to be written by the database log-writer. The nature of the corruption is unpredictable and happens infrequently, however under heavy load, without the patch the corruption is easily reproduced.

The nature of the bug is that errors are not reported at the moment of the corruption. It is only later when database's archive process reads the redo logs that the corruption is detected and reported. It is not known if there are other silent data corruptions occurring to other files and are simply not reported.

There are a number of circumventions reported by you and others, including not using ASM for database storage, treating the ASM disks as 4K sector devices, avoiding the 3.10.0-514 Redhat kernel, and using ASMLIB for device management. At this point, if at all possible, Oracle recommends customers simply avoid this particular kernel in an Azure environment. They could use Oracle's UEK kernel (Oracle Linux Azure VM) or an older Redhat kernel. The other workarounds of treating ASM disks as 4K sector disks and/or using ASMLIB, while likely effective, do involve additional management efforts, and we're not completely certain that the issue is entirely avoided. 

With respect to your blog, I request that it be updated to include the following points:

·         LUNs in an Azure are presented as 512e devices. That means that internally they are structured as 4K "sector" disks (physical), but emulate 512-byte sector disks (logical) from an application perspective. I put "sector" in quotes because this is not really true of SSD disks, but the Advanced Format Disk (512e) specification was written with conventional rotating disks in mind. ASM and the Oracle database work correctly with Advanced Format disks in 512-byte emulation mode, and it is not necessary to create 4K disk groups for correct operation. Some flash storage vendors recommend doing so, but strictly for performance reasons associated with their particular products. There is no need for ASM to detect sector size in this respect.
·         The bug discussed here is with particular Redhat kernels. This bug is exposed in an Azure virtualization environment with ASM. The kernel we know to be problematic is 3.10.0-514. There may be other kernels having the problem, but we could not reproduce the issue with Redhat kernel 3.10.0-327 or Oracle’s current UEK kernel.
·         We do not know if creating 4K sector disk groups is a complete fix. At best it circumvents the bug. Our testing seems to verify it as a reliable workaround, but there may be situations where data is still silently corrupted.


Thank you for reporting this issue. Oracle takes the issue of data corruption as one of our most important concerns. If there is other information I can provide or if you would like to discuss this by phone, please let me know.



Installing Zabbix into Azure using a MySQL PaaS

$
0
0

For those of you that don't know Zabbix is an open source monitoring tool very much like Nagios (which i think is slightly better known).I blogged a while ago about how to set this up using Oracle as the back end system here but now I'm having to set this up in Azure I've gone a different route.

There is no Oracle PaaS solution in Azure (and likely never will be) however there are a couple of PaaS solutions for MySQL - one of these has recently been added by Microsoft themselves and actually seems like much better value than the other offerings so I wanted to try setting up Zabbix using that as the backend so i don't have to worry about managing MySQL (which i still know very little about).

More details on the Microsoft offering https://azure.microsoft.com/en-us/services/mysql/

In this post I'll take you through the steps i went though to set up zabbix server from scratch on a Redhat 7.3 IaaS box connecting to a MySQL PaaS backend repository database. Even if you have no interest in Zabbix there are some useful bits of information in this post for general use - i certainly learnt quite a lot doing it.

I'll start off by provisioning the MySQL PaaS - so here we go.

1) Go to the azure portal and search on the marketplace until you find "Azure Database for MySQL" and click 'create'


2) Fill in the basic details required for the service - see basic example below - there is not too much that is really selectable


3) Then click the create on this page and a couple of minutes later the db server is provisioned and we can navigate to its main maintenance page which looks something like this - i've purposely highlighted the connection settings page to show that if gives you the basic connect string types for most common tools.



4) now it's provisioned i want to connect to it to make sure all is OK - now this is where my lack of MySQL skills combined with some subtle differences in Azure connectivity caused me to stumble for a while when connecting - here is what happened:

First attempt:

[root@redhat conf.d]# mysql -u myadminaccount -p -h myazurepaasservername.mysql.database.azure.com
Enter password:
ERROR 2001 (28000): The connection string may not be right. Please visit portal for references.

OK this took me a while to get further with - turns out the username format has to be specified slightly differently

Second attempt:

[root@redhat conf.d]# mysql -u myadminaccount@myazurepaasservername.mysql.database.azure.com -p -h myazurepaasservername.mysql.database.azure.com
Enter password:
ERROR 2001 (28000): SSL connection is required. Please specify SSL options and retry.

OK - good progress but network encryption with ssl is required- so how to do that?

Third attempt:

Download the cert (this is the same for any MySQL Paas)

wget  https://www.digicert.com/CACerts/BaltimoreCyberTrustRoot.crt
--2017-07-11 10:19:25--  https://www.digicert.com/CACerts/BaltimoreCyberTrustRoot.crt
Resolving www.digicert.com (www.digicert.com)... 64.78.193.234
Connecting to www.digicert.com (www.digicert.com)|64.78.193.234|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 891 [application/x-x509-ca-cert]
Saving to: ‘BaltimoreCyberTrustRoot.crt’

100%[==============================================================================================================================>] 891         --.-K/s   in 0s

2017-07-11 10:19:26 (230 MB/s) - ‘BaltimoreCyberTrustRoot.crt’ saved [891/891]

Now go in to openssl and create a client cert based on that

[root@redhat ~]# openssl
OpenSSL> x509 -inform DER -in BaltimoreCyberTrustRoot.crt -out MyServerCACert.pem

That create the .pem file that i can now reference in my connection - so lets try again

[root@redhat ~]# mysql -u myadminaccount@myazurepaasservername.mysql.database.azure.com -p -h myazurepaasservername.mysql.database.azure.com  --ssl-ca=./MyServerCACert.pem
Enter password:
ERROR 2003 (28000): Client with IP address x.x.x.x is not allowed to access the server.

And now the firewall blocks it (slight aside here before we cover that - the certificate info can be specified in a global config file (/etc/my.cnf) so we don't have to pass it each time - sample content here

cat /etc/my.cnf
[client]
ssl-ca=/root/MyServerCACert.pem

Right back to the firewall - lets add my ip that was reported in the error into the MySQL PaaS config - that's done on this screen


And now Fourth attempt:

[root@redhat ~]# mysql -u myadminaccount@myazurepaasservername.mysql.database.azure.com -p -h myazurepaasservername.mysql.database.azure.com  --ssl-ca=./MyServerCACert.pem            Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 65020
Server version: 5.6.26.0 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]>

And we're in!


Right stage 1 complete - now to go and do the zabbix server install - then we'll come back and hook them up together.

So for the zabbix server install on RHEL 7.3 here are the steps

1) Load the basic software

 yum install httpd httpd-devel mysql php php-cli php-common php-devel php-pear php-gd php-mbstring php-mysql php-xml

2) enable apache to autostart and also start it

chkconfig httpd on
service httpd start

3) Add the zabbix repo to the existing set

rpm -Uvh http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-release-3.0-1.el7.noarch.rpm

4) install zabbix components from that repo

yum install zabbix-server-mysql zabbix-web-mysql zabbix-agent zabbix-java-gateway

5) update the date.timezone parameter in /etc/httpd/conf.d/zabbix.conf

<Directory "/usr/share/zabbix">
    Options FollowSymLinks
    AllowOverride None
    Require all granted

    <IfModule mod_php5.c>
        php_value max_execution_time 300
        php_value memory_limit 128M
        php_value post_max_size 16M
        php_value upload_max_filesize 2M
        php_value max_input_time 300
        php_value always_populate_raw_post_data -1
        php_value date.timezone Europe/Berlin
    </IfModule>
</Directory>

6) restart apache to pick that up

service httpd restart

And at this point we have the basic software in and running but we need to now link the two components together

So first step now is to create the database objects in the database - the file containign the commands comes as part of what we just installed - so lets go and load it

cd /usr/share/doc/zabbix-server-mysql-3.0.9
gunzip create.sql.gz

Now we have the create.sql file which contains everything we need to create - now we just need a database to put that in. So lets login to the PaaS and create a new zabbix database/schema (whatever the term is in MySQL :-))

MySQL [(none)]> CREATE DATABASE zabbixdb CHARACTER SET UTF8;
Query OK, 1 row affected (0.26 sec)

MySQL [(none)]> GRANT ALL PRIVILEGES on zabbixdb.* to myadminaccount@localhost;
Query OK, 0 rows affected, 2 warnings (0.61 sec)


MySQL [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.77 sec)

MySQL [(none)]> quit
Bye

Now lets load the objects in - loads of output here so i removed most of it

MySQL [zabbixdb]> source create.sql
Query OK, 0 rows affected (1.03 sec)

Query OK, 0 rows affected (0.52 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.48 sec)
Records: 0  Duplicat

etc
etc 

OK - all seems good so far - now lets start zabbix server and hook these things together


chkconfig zabbix-server on

[root@redhat zabbix-server-mysql-3.0.9]# service zabbix-server start
Redirecting to /bin/systemctl start  zabbix-server.service
Job for zabbix-server.service failed because a configured resource limit was exceeded. See "systemctl status zabbix-server.service" and "journalctl -xe" for details.

And this is where it got tricky again - lets try and get some more info what is wrong..

[root@redhat zabbix-server-mysql-3.0.9]# systemctl status zabbix-server.service
? zabbix-server.service - Zabbix Server
   Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: resources) since Tue 2017-07-11 11:00:28 UTC; 3s ago
  Process: 38203 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=1/FAILURE)
  Process: 38211 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
 Main PID: 38202 (code=exited, status=1/FAILURE)

Jul 11 11:00:28 redhat systemd[1]: zabbix-server.service never wrote its PID file. Failing.
Jul 11 11:00:28 redhat systemd[1]: Failed to start Zabbix Server.
Jul 11 11:00:28 redhat systemd[1]: Unit zabbix-server.service entered failed state.
Jul 11 11:00:28 redhat systemd[1]: zabbix-server.service failed.

That's pretty useless but a quick google revealed this was something selinux related so i installed the setroubleshoot software (which had about 80 pre-reqs.....)

yum install setroubleshoot

From one of the tools in this set i could then run this to tell me what was wrong

 ausearch -m avc,user_avc,selinux_err |grep zabbix |grep denied

which showed this

type=AVC msg=audit(1499771577.661:620): avc:  denied  { setrlimit } for  pid=38708 comm="zabbix_server" scontext=system_u:system_r:zabbix_t:s0 tcontext=system_u:system_r:zabbix_t:s0 tclass=process

To allow selinux to allow this i can use this other utility audit2allow to create a new policy addition that allows it

ausearch -m avc,user_avc,selinux_err -ts recent|grep zabbix |grep denied | audit2allow -M 

zabbixpol
******************** IMPORTANT ***********************
To make this policy package active, execute:

semodule -i zabbixpol.pp

I then run the command line it mentions and the zabbix server starts OK.

Now i need to open the firewall to allow me to connect to the zabbix website

[root@redhat zabbix]# firewall-cmd --zone=public --add-port=80/tcp --permanent
success
[root@redhat zabbix]# firewall-cmd --reload
success

Now we can switch back to GUI mode to complete the setup - so lets navigate to the zabbix website - which is just http://servername/zabbix



We move to the next screen for db connection properties - and at this point i realized i didn't really want to store the admin account credentials for the actual application - so i went ahead and created a new zabbix account to be used for logging in to the database.

This was just a case or running this

MySQL [zabbixdb]> grant all privileges on zabbixdb.* to zabbix@'%' identified by 'password';
Query OK, 0 rows affected, 1 warning (0.75 sec)

MySQL [zabbixdb]> flush privileges;
Query OK, 0 rows affected (0.74 sec)

I then used these credentials in the config screen.


And this is where i get into more issues - it started with this - which was an easy fix (similar to the issue i had on command line so i just had to extend to the other username format.


Even after that though i couldn't connect with the message error (13) - this again was a very tricky one to solve - in the end it was some combo of these 3 changes that got me past that

setsebool -P zabbix_can_network=1
setsebool -P httpd_can_network_connect_db=1
semanage port -m -t http_port_t -p tcp 10051

After that though i hit the ssl connection error that i saw from command line - and after a lot of digging it seems that the php code supplied uses a mysql api that doesn't support ssl connections -it's possible this could be changed by someone who knows what they are doing but i left it for the moment and changed the PaaS to allow non ssl connections


After that it connected fine and i moved on to the next screen


And the next


Then it seems to have worked


Let's try and login


And we're good!



So there you have it - Zabbix works fine with a PaaS database - little fiddly in some parts but some of that was me just getting to grips with how things work. selinux threw a few spanners in - so if you don't care for that switch it off and it will save you some pain.

By the way the picture at the top is a monitor on a platform and this is a monitoring tool being installed on a platform - see what i did there......? :-)

Sorry there is more to this as i just discovered when trying to get the agent on the server to work....

Quick addendum on additional steps........

chkconfig zabbix-agent on
service zabbix-agent start

This fails to start due to selinux - so we do same trick as before

ausearch -m avc,user_avc,selinux_err -ts recent|grep zabbix |grep denied | audit2allow -M zabbixagentpol
******************** IMPORTANT ***********************
To make this policy package active, execute:

semodule -i zabbixagentpol.pp

Now we start OK

service zabbix-agent start

Now we enable some more ports in the firewall

firewall-cmd --add-port=10050/tcp --permanent
firewall-cmd --add-port=10051/tcp --permanent

Then we have to update the config file /etc/zabbix/zabbix_server.conf with the mysql location - i had thought the GUI had done this - but it seems that's just for the php web config and not the backend server config - so the following lines need to be updated

DBHost=yourpaasname.mysql.database.azure.com
DBName=zabbixdb
DBUser=zabbix@yourpaasname.mysql.database.azure.com
DBPassword=yourpassword


And for the final twist in the tale there is yet another seliux thing to allow as once the connection gets further port 10051 has issues :-) - so the fix for that is this

ausearch -m avc,user_avc,selinux_err -ts recent|grep zabbix |grep denied | audit2allow -M zabbixserver
******************** IMPORTANT ***********************
To make this policy package active, execute:

semodule -i zabbixserver.pp

After that and as long as you have the correct server/ip/active server in the agent config the dman thing finally appears OK and goes green.....


Changing user passwords and expiry in Azure AD

$
0
0


We've now had users created for a while in Azure AD - and here I'm talking about users that have been created directly in Azure AD - not anything that has been replicated from some 'normal' AD on prem.

What we've discovered (surprisingly....) is that the passwords on these accounts expire and have to be changed (after we think 90 days by default - but not 100% sure on that).

This is an issue for some of them as they are 'service accounts' - i.e. not used by end users but by application services and interfaces. In a lot of cases we want these passwords to not be changing as its much easier to manage rather than having to plan and reconfigure application config every couple of months.

Not strangely the portal offers no facility to either reset a password to the same value or to set a password that doesn't expire - even as a global domain admin this functionality is not there in the GUI. I suspect that maybe as Azure AD evolves tis kind of stuff may be added but as of now it's not there.

So to make these kind of changes you have to use powershell (and actually I think powershell is currently the only option for this)

Show me the script I hear you ask as if you found this page that's likely all your really interested in - well here you go

Install-Module -Name Msonline

Connect-MsolService
Set-MsolUserPassword -UserPrincipalName linuxtest@yourdomain.onmicrosoft.com -ForceChangePassword $true -NewPassword passwordhere
Get-MsolUser -UserPrincipalName
linuxtest@yourdomain.onmicrosoft.com -TenantId xxxx-xxxx-xxxx|Set-MsolUser -PasswordNeverExpires $true
Get-MsolUser -UserPrincipalName
linuxtest@yourdomain.onmicrosoft.com -TenantId xxxx-xxxx-xxxx| Select PasswordNeverExpires

Now to expand a bit more for the people who may be interested in more details of the steps

Now in my case I was using powershell v5 - I would recommend going to that version if you didn't already as a lot of azure related stuff seems very sensitive to powershell versions (and indeed azure add on module versions)

Assuming you have powershell 5 and have hooked everything up you'll be able to install the msonline set of cmdlets to make the changes possible - that's this line

Install-Module -Name Msonline

Once installed you can then connect to azure (where you'll be prompted via a gui window to login)

Connect-MsolService

Once logged in you can use this code to choose a new password (here for the linuxtest user)

Set-MsolUserPassword -UserPrincipalName linuxtest@yourdomain.onmicrosoft.com -ForceChangePassword $true -NewPassword passwordhere

The next line sets the password to never expire

Get-MsolUser -UserPrincipalName linuxtest@yourdomain.onmicrosoft.com -TenantId xxxx-xxxx-xxxx|Set-MsolUser -PasswordNeverExpires $true


And finally just to confirm that the last thing did set - you can select back the status of the password expiry

Get-MsolUser -UserPrincipalName linuxtest@yourdomain.onmicrosoft.com -TenantId xxxx-xxxx-xxxx| Select PasswordNeverExpires

This is easy when you know how - but it took me a while to build this as I couldn't just find a simple example of this.

Note that in most simple cases you'll only have one tenanted and this property can actually be removed and does not have to be passed - its just in our case we have more than one and the default one was not the one I needed to work on.

Hopefully the GUI is improved soon - and hopefully domain services gets moved to ARM soon too as that is a real pain and the only part we haven't got in the new portal.

MySQL Paas password change - is this a bug or a feature?

$
0
0


Now I'm no MySQL expert which is one of the main reasons we are using the (relatively) new MySQL PaaS offering from Microsoft in Azure - today we've seen some odd behaviour when trying to do some grants and password setting. I'm not sure if this is how MySQL somehow behaves when you are doing things wrong or if this is a restriction of PaaS or indeed if it's a bug in the Microsoft implementation of a MySQL PaaS - see what you think.......

So here is the simple test case



MySQL [phpipam]> grant all privileges on phpipam.* to richtest@localhost identified by 'phpipamadmin';
Query OK, 0 rows affected, 2 warnings (0.28 sec)

That runs just fine and creates a new user called richtest and gives it a password - this syntax seems a little alien to me comnig from the oracle world - it seems to do 2 things in 1 command - anyway no matter it seems to work fine

The odd thing is when i try and run it again - which as i understand it should just re grant the rights and change the password - i get this message

MySQL [phpipam]> grant all privileges on phpipam.* to richtest@localhost identified by 'phpipamadmin';
ERROR 1044 (42000): Access denied for user 'admin'@'%' to database 'mysql'

So it seems that i need some rights on the core myql database that i don't have - and indeed this syntax doesnt work

MySQL [phpipam]> set password for 'richtest'@'localhost' = 'newpass';
ERROR 1044 (42000): Access denied for user 'admin'@'%' to database 'mysql'

And neither does this (this syntax feels like hacking by the way - like updating user$ directly in oracle....)

MySQL [phpipam]> UPDATE mysql.user SET Password=PASSWORD('newpass')WHERE USER='richtest' and host='localhost';
ERROR 1044 (42000): Access denied for user 'admin'@'%' to database 'mysql'

However i happily have rights to drop the user......

MySQL [phpipam]> drop user richtest@localhost;
Query OK, 0 rows affected (0.25 sec)


This can't be right can it? I have to drop a user to be able to choose a new password? There is a seperate azure super admin account (that i don't have a password for) that does seem to have more rights but there is no facility in the PaaS GUI to be able to do anything to change any users password other than the default admin one you get.

Is this a PaaS bug? 

Thoughts from MySQL experts?

Tracking all your networks and subnets

$
0
0


Now we've started building a lot more in Azure it's quickly become apparent that you need to plan the networking properly and have some tooling to help you with this.

I'm sure networking people already know about the tools etc that are on the market to help with this but we've come from a position of knowing very little and having to fill the knowledge gap.

I thought we'd share 2 tools that we have found very useful that help us with this - the first is this

http://lightmesh.com/subnet-builder/

This allows you to take a network and split it up into subnets - we found this really helped with visualising what we were doing and helped with the planning - the online tool is free to use - here is a screenshot from that to give you an idea - it's a lot nicer than most other subnet calculators I've seen.



And secondly is this one - which we have installed and use to track what subnets we have allocated and help with out IPAM (ip address management) .

https://phpipam.net/

This allows us to track all the subnets we create and what they are used for - it can do a lot of other stuff too - but we are just using the basics for now - we found just trying to do it in Azure was quite difficult to do but this tool we quite like.

Again here is a quick pic to give you a flavour



There are loads of other tools out there and I'm not really an expert in this stuff but i certainly found them useful - in fact the first tool is useful even if you are not involved in any of the planning just to understand how subnets work a little better.

More oddity with MySQL PaaS

$
0
0


Whilst setting up the phpipam tool (the one i mentioned in my last post https://phpipam.net/) i decided to use a MySQL PaaS in Azure to host the database part rather than manage MySQL myself - however this proved a little tricky to get working.

Well i say tricky - it just needed the config file amended and then it was fine - however this was not at all obvious - let me explain what happened.

So after loading the software usig the normal yum tooling you then have to link the app to a database via the gui screen below - you have a few options - i chose the last one the manual install



 So i click that then follow the steps as described on the screen below - after that i put the login details in the config.php file and then the login button should redirect me to the login page



 However that wasn't happening at all and i was just being redirected to the main install screen - and it continued round in that loop with no error message coming out at all.....

So i started to dig into the php code to see what was happening ( i don't know php by the way so i was learning as i went along).

After a lot of trial and error and tracking the code path through i finally came across this piece of code

/* open persistent DB connection */
$database = new database($db['host'], $db['user'], $db['pass'], $db['name'], NULL, false);
if($database->connect_error) { $dbFail = true; }


This connects to the mysql database based on the contents of the config.php file - however if there are any errors all that happens is the value of the dbFail variable is set to true - but no actual errors about what is wrong are displayed.

So i added an extra line based on what i found on google for this kind of thing


/* open persistent DB connection */
$database = new database($db['host'], $db['user'], $db['pass'], $db['name'], NULL, false);
if($database->connect_error) { $dbFail = true; }
/* added by rich */
echo ('Connect Error (' . mysqli_connect_errno() . ') ' . mysqli_connect_error());


Now when i run the page i see this " connect error (9002) "



So progress - something to go on. However in this case there is nothing of any use that comes up on google - however based on my experiences with setting up another mySQL PaaS i suspected the username format may be the problem.

So i changed the config file from this


/*      database connection details
 ******************************/
$db['host'] = "mysqlpaas.mysql.database.azure.com";
$db['user'] = "phpipam";
$db['pass'] = "password";
$db['name'] = "phpipam";



to this


/*      database connection details
 ******************************/
$db['host'] = "mysqlpaas.mysql.database.azure.com";

$db['user'] = "phpipam@mysqlpaas.mysql.database.azure.com";
$db['pass'] = "password";
$db['name'] = "phpipam";


After i did that the connect error went away and i'm presented with the proper login screen.



I thought it worth sharing as this different username format for PaaS will likely cause a lot of confusion - i've been trying to find a note that describes this and why it has to be specified this way but not real luck so far.

Hopefully this helps someone else

Multiple smtp relay send accounts for office365

$
0
0


This is a follow up on this post here about smtp and office365 - after some more investigation i've made some further changes as our use case changed slightly and the original solution didn't fit.

Originally we just needed to be able to send all emails originating from a single account and we had some postfix config that would do that.

Now however we want to be able to send smtp messages out from multiple accounts - the account being used to send out being dependent on the from address being specified in the mail.

For example if the messages are from electricity@energycompany.com they should be send out using that email address from the smtp relay , if they are from gas@energycompany.com then they should be sent out using that account.

The basic setup i had only allowed a generic@energycompany.com for every single mail that was sent.

So how did i fix this?

Well after a lot of reading and trial and error the solution is actually surprisingly simple - these are the parameters in question in the main.cf postfix config file

smtp_sender_dependent_authentication = yes
relayhost = smtp.office365.com:587
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = regexp:/etc/postfix/sasl_passwd
smtp_tls_security_level = may
smtp_sasl_security_options = noanonymous


The key ones really being the top one and the 4th one - these are basically saying make the sender dependent on the from address being used and look that up using pattern matching in the sasl_password file and use the credentials you fin there to do that









The contents of the password file being like so:

/^first.last@company.com$/ office365username1:office365password1
/^anotherfirst.anotherlast@company.com$/ office365username2:office365password2
/^yetanotherfirst.yetanotherlast@company.com$/ office365username3:office365password3


So to be explicit if i send a message to the smtp relay on port 25 with a from address of first.last@company.com the mail will be sent out using the mailbox in office365 access using the login office365username1:office365password1

If it comes from anotherfirst.anotherlast@company.com it then uses office365username2 - you get the idea.

Anything not matching the pattern gets rejected, i would imagine you can have a catch all at the end for a generic account that could send anything else out should you so wish.

This solves our simple requirements at the moment and means we can have a single smtp relay used by mutliple applications to send mail out - without having to 'modernize' the application to call office365 direct (whcih we should probably do at some point....)

Hope this is useful i found it really hard to find an example on google i could just cut and paste from......

Tuning by understanding your application

$
0
0


This is a short post to illustrate how the understanding of how an application works can make all the difference to how you an tune it. This is a real world case which is actually pretty simple but i think illustrates the point quite well.

The performance symptoms were as follows:

1) Live environment works fine
2) Copy of that environment is very slow performing a specifc function - the data between the two systems is essentially the same - just santitized for test (volumes and data distribution the same)

When we look into the specifics of the slowness (in this case a trader confirming a trade) we can see that the SQL that is running is actually pretty simple but performs completely differently between the two environments - the SQL is shown below:

 with b as 

   select distinct BOOK  
   from   J_BOOK_CACHE_V2 
   where  session_id = 'c0287841-cec3-455b-b0a4-bac81418a055'
)
select    TRADE_TNUM        
FROM      tpow t,
          b
where     t.AUDIT_ALF <= TO_DATE('03102017','DDMMYYYY' ) 
AND       t.AUDIT_AOHM > TO_DATE('03102017','DDMMYYYY' ) 
AND       t.AUDIT_ZKEY = '5243123'
and       t.trade_book = b.book;


Now looking it more detail the execution plan is totally different between the two systems - but why is that - the systems should be copies of each other?

The copy was actually done as an rman duplicate (then sanitised) and the spfile contents were copied exactly - so other than the sanitizing changes everything should be the same (including the stats).

So whats going on here - well a closer look at the stats shows that the table in the with clause at the top has a few hundred rows in production but none in the test environment - why is that?

Well after investigating how the application works it transpires that this J_BOOK_CACHE_V2 table is 'temporary' (well it's content are temporary but it's still stored in a normal heap table). There is only data in there when users are logged in and doing things. On logout the data is cleared down again.

So with this behaviour now in mind we have a simple hypothesis:

1)    During the day people are logged in and data exists in the ‘temporary book cache’ table J_BOOK_CACHE_V2.
2)    At night everyone is logged off and the table is cleared down automatically by the logging out process.
3)    The stats job runs when the table is empty – so oracle thinks any query against this table will return no rows and adjusts any queries involving this table to expect that
4)    When the screen is run data then exists In the table (where oracle expects none) and because there are rows the way Oracle decided to run the query is wrong – hence the bad performance.
5)    In live there is generally always someone logged when the stats job runs so the stats never say 0 and the plans are good…….

Anyway that was the theory.

So to fix it I ran the following series of steps (see below) – basically I insert some dummy data, gather stats while it’s there, lock those stats so they can’t be changed then delete the temporary data.





 
insert into J_BOOK_CACHE_V2 select distinct 'cac6de6f-2b21-49b1-85b0-231a9dc1abc0RICH',book_book from book;
insert into J_BOOK_CACHE_V2 select distinct 'fhj6de6f-2b21-49b1-85b0-231a9dc1abc0RICH',book_book from book;
insert into J_BOOK_CACHE_V2 select distinct 'jop6de6f-2b21-49b1-85b0-231a9dc1abc0RICH',book_book from book;


commit;


exec dbms_stats.gather_table_stats('ZAINET','J_BOOK_CACHE_V2',method_opt=>'for all indexed columns');


exec dbms_stats.lock_table_stats('ZAINET','J_BOOK_CACHE_V2');


delete from J_BOOK_CACHE_V2 where session_id like '%RICH';


commit;


Now when I re-run my test case from SQL performance is the same as live (i.e. very quick)

Strangely enough this hadn't been a problem in test environments before and I can only assume that there was some data that had not been cleaned up properly by the logout process and this quirk was causing the stats to have some value and things to work OK.



Now this was a very easy fix and a simple example but the principle is very useful - to best tune your database you need to understand how the application works. Tuning just based on database/SQL tuning skills can get you a long way but to get the best performing database application/data knowledge is king.

Tracing a session SQL in "Azure SQL"

$
0
0


With our 'cloud first' strategy we are trying to make use of Azure SQL for all of our databases as much as possible. This (as we have discovered) does come with a few things you need to be aware of as this is not 'normal' SQL Server - there are various things that are not there and quirks about the way some of the things work.

In a case this week we were having trouble getting a certain process working and we wanted to trace what the application code was running inside the database. Now the PaaS GUI via portal.azure.com does offer some stuff - it basically picks out some of the top statements and helps you tune them but it doesn't allow you to just pick a session and show everything its doing.

So how to do this - my initial thought (and bearing in mind I was an Oracle DBA and not really a SQL Server expert) was to use SQL Profiler - I've used this in the past and it allows you to see everything that's going on - it's actually quite a nice tool and better than Oracle here in a lot of respects.

Problem is though there is no SQL Profiler.... Even if you take the SQL 2017 client and try and point this at PaaS it just doesn't work and is not an option - so what do we do?

Well after a bit of digging I found this blog note from Arun - which did exactly what I wanted to do (and actually the blog seems generally useful - one to remember):

https://blobeater.blog/2017/02/06/using-extended-events-in-azure/

In my case I just pulled out these small bits of SQL to achieve what I wanted - I don't know all the in's and out's and what extra options etc there are for this - it just does what I want and I didn't need to look any further than that yet......

First thing we do is create the following and enable it

CREATEEVENTSESSION azure_monitor
ONDATABASE
ADDEVENT sqlserver.sql_statement_completed(ACTION (sqlserver.sql_text,sqlserver.database_name)),
ADDEVENT sqlserver.database_xml_deadlock_report
ADDTARGET package0.ring_bufferSET max_memory = 500, max_events_limit = 10 )
;

ALTEREVENTSESSION azure_monitorONDATABASE
STATE=START;

In my application I then run the function I'm having trouble with - then after that run the query below to show the sql that has been running - you seem to have to be quite quick to get this info - perhaps the setting I the code above can be made larger to keep more in memory so there is more to query.

DECLARE @ShredMe XML;
SELECT @ShredMe =CAST(target_data ASXML)
FROMsys.dm_xe_database_sessions AS s
JOINsys.dm_xe_database_session_targets AS t
ON t.event_session_address = s.address
WHERE s.name=N'azure_monitor';
SELECTQP.value('(data[@name="statement"]/value)[1]','varchar(max)')as [SQL CODE],


QP.value('(action[@name="database_name"]/value)[1]','varchar(max)')as [Database],
QP.value('(@timestamp)[1]','datetime2')AS [timestamp] FROM @ShredMe.nodes('RingBufferTarget/event[@name=''sql_statement_completed'']')AS q(QP);

GO



This shows me output from the code running something like this:



Then I can see the code of interest (and everything else the session ran) - very useful

Then I just need to tidy up so we don't trace for ever

ALTEREVENTSESSION azure_monitor
ONDATABASE
STATE=stop;
 

ALTEREVENTSESSION azure_monitor
ONDATABASE
DROPTARGET package0.ring_buffer;
 

DROPEVENTSESSION azure_monitor
ONDATABASE;
GO



Now we're back to normal - think this code will be very useful - thanks again Arun




;

Using a hardware token to log on to linux

$
0
0


PKI/hardware tokens are now quite a commonplace accessory in the IT workplace and are generally used to enable VPN connectivity to the office or to sign or encrypt emails. I knew they could be used to authenticate server (and even database access) but this always sounded like it was a very complex thing to do and would only ever really be appropriate in government/defence type use cases where even higher levels of access control are required.

In a conversation this week though with our security guys they were saying that's its actually very easy to do for Linux and we should look into doing it - so I did just that......

Now like many people I use the excellent PuTTY tool for Linux machine access ( I love the way the url is so very British :-)) - however it seems this natively does not support the use of hardware tokens for authentication - instead there is a fork called PuTTY-CAC which includes this functionality (as well as everything putty had up to the fork version - which is pretty much everything).

You can download the installed or just the individual executables should you so wish.

Once the file is downloaded you launch it and at first glance it just appears to be exactly the same as normal putty - there is just a small extra section shown below

 
 
Clicking (in my case) on the Set PKCS Cert opens up a file explorer window where you need to locate the dll that is used to read/write from your token - in my particular case we are using cryptovision but there are many other providers that it could be - so I browse and find C:\Windows\System32\cvP11.dll
 
Once I do that the dll is accessed and the token queried to identify my certificate
 
 
 
I then OK this one which loads it into the putty-cac configuration - the screen now shows the thumbprint in the window.
 


 
 
The nice thing now is that once this is done I can click on the copy to clipboard option and the string I need to use on the Linux box is generated exactly for me - so I click that and get something like this
 
ssh-rsa blahblahblahblah PKCS:blahblahblah=C:\Windows\System32\cvP11.dll
 
Once I have that I then choose a Linux machine I want to try this out on - in my tests I used both Ubuntu and RHEL - both of which worked without issue.
 
Now the last stage is actually very easy and if you are familiar with pki anyway (you may have used to copy between users on different Linux boxes without the need for passwords) and the hardware token version of this is no different.
 
So steps are (and first 3 only necessary if you never used keys before):
 
1) decide on the user you want to try this out on - it has no link whatsoever to the owner of the token - so you could try it as rich or oracle or even root - any of them will do - all you are doing is copying the public key of the token into an account on Linux.
2) run ssh-keygen as that user in the $HOME directory to generate some of the basic stuff required - just accept all the defaults here
3) in $HOME/.ssh create a file called authorized_keys
4) Paste the contents of the string generated from putty-cac above into the authorized_keys file
 
And that's it - ready to test (now a warning here that for me with a default server install this just worked - but I know from past experience that ssh can be very picky on the exact file/directory permissions that need to be set on everything related to ssh to ensure it is secure - so if it's not working bear this in mind)
 
So if I now with all that config in place launch putty towards my server initially everything looks the same and I am prompted for a login - now I enter rich/oracle/root (whatever I decided to test with) and this is now where it is changed - I'm now presented with a pki pin box for me to enter.
 
 
 
If I get the pin wrong I get this message and am denied access
 
Server refused public-key signature despite accepting key!

If I get it right though it logs me in just fine



And there you have it - this was really surprisingly easy and can give an ever greater amount of security to your Linux access - another nice feature of this is that if someone leaves or the token is lost the certificates can be disabled to not work - this is done via certificate revocation list (much the same as https certificates)


 
 

More than one way to skin a cat (and by cat i mean azure)

$
0
0


Not sure what the actual first origin is of the english phrase 'there is more than one way to skin a cat' but it seems vaguely appropriate when dealing with doing things inside Azure. There are multiple ways to acheive the same thing - for example (this is not an exhaustive list....)

1) Azure portal
2) powershell
3) Visual Studio
4) az command line
5) build your own app with the client sdk

you get the idea.....


Anyway this multitude of options has proved quite useful recently - we've been having an issue with the Azure portal in that the azure files option with azure storage just isn't working (something to do with our firewall/proxy setup i think) - we just seen a sad little raincloud instead of the details we should be seeing - i.e. something like this






Now as well as being annoying - this also prevents us from working with them - i.e. adding new ones or extending the size etc

The good thing with having so many interfaces is that we can try another one of them to help us out.

In this case I'm going to use az command line, and the good thing is i can actually still do this from the browser - i don't need to install it anywhere or perhaps spin up a docker container with it preloaded on there - Microsoft helpfully added the cloud shell option into the browser (which recently had powershell support also enabled) - so how do i get to that?

Well simply click on the >_ symbol at the top of the portal page (easy to miss to be honest but its a very nice feature) - then this happens




By default it goes into bash but it can be switched to powershell should you wish

Now you have a pre logged in command window where you can run the az command to work with the file storage.

The first thing you have to do is tell it the subscription you want to work with - thats just done by running

az account set -s subnamehere

Once you are working in the right subscription you then need to run the relevant storage command - to just list what shares you have you would run

az storage share list --account-key keynamegotfromtheportal (the one ending in ==) --account-name storageaccountnamehere




In my case that currently returns a null entry

[]

To then add a new one i run

az storage share create --account-key keynamegotfromtheportal (the one ending in ==) --account-name storageaccountnamehere-n testshare --quota 100

So that creates a new share called testshare of maxsize 100GB - the command then returns this


{
  "created": true
}


I can now list out the shares by running the list command again


And there we have it - 1 differently skinned cat .....

Raising a RHEL support request in Azure

$
0
0


We've been using RHEL machines in Azure for a while now but had no cause to actually make use of the support option we're paying for with the licence/support included images we are using.

So how do you do this?

Well it's all integrated into the portal and on the VM screens you get a nice extra button - see pic below (one from bottom)

Clicking this then takes you to this screen

If you then click go to customer portal then this happens


AADSTS90094: The grant requires admin permission

Right........

So thats useful - i'm clearly missing some rights - but no idea which ones and there seems to be zero documentation about what is required.

So we raised a support ticket with Microsoft to ask what is required - they basically didn't know.....

After some messing about though we found that the following works - which does seem like total overkill and maybe there is some lesser role that makes this work but it's fine as a temporary grant i guess.......

So we go to Azure AD and grant the user global admin on the directory using the super user azure accounts......


This runs fine


Then we try again as the linux admin user






Now it's happy and we accept the proposal......

Now we get forwarded to relate this to our existing redhat account (or create a new one) - think something was a little screwy with the redhat webservers when i went to this - see screenshots below - but it still worked.






After that we get this nice message and it's all linked up and we can now raise tickets and access the redhat docs. We can now revoke the directory role and it still continues to work fine (even against new server we might raise tickets for - it's a one time grant).

So got there in the end. This needs better documenting i think to avoid people getting stuck on the same issue. Maybe there is some sort of assumption that the person raising tickets would also always have this global admin rights - but that really depends on how the Azure support teams are structured and how separation of duties is defined. For us at least os admins dont have full control in the Azure AD - they have limited roles in the portal to just do what they need to be able to.



I would think most large IT shops would work in a similiar way - unless they've been overrun with bearded hipsters who are devops rockstars and do everything everywhere themselves.......

Fusing blobs in Azure.......

$
0
0


If you're reading this as a DBA you might be thinking i've written some plsql code to combine to blob column values into one - you'd be sadly mistaken what I'm actually going to talk about is this thing

https://azure.microsoft.com/en-us/blog/linux-fuse-adapter-for-blob-storage/

What this enables is the mounting of azure blob storage directly as a unix 'filesystem'. Now azure file storage can already be done using the tried and trusted smb protocol (samba) and we already have experience of that working (albeit a little slowly) on existing servers. However blob storage could not be used in this way and all interactions would be via rest calls or using some sort of api (az/azcopy etc)

What interested me about this is that it would perhaps give me another option for being able to directly write oracle backups to azure storage without having some kind of intermediate step ( I can't see Oracle ever releasing an api for rman to enable this as they dont want people putting oracle in Azure - only oracle cloud will do.......). Experience with azcopy utilities had shown some incredible performance and i wanted to see if this was somehow replicated.

So i went about doing some testing of this.

As a prereq i created a blob storage account in the portal - smoe quick screen grabs of that just for reference - so go through the wizard


create a container inside it






And find the access keys we need later





The software itself is released by microsoft and you can see it all in github here https://github.com/Azure/azure-storage-fuse

I followed the install instructions from here  https://github.com/Azure/azure-storage-fuse/wiki/Installation Which in my case just equated to running

rpm -Uvh https://packages.microsoft.com/config/rhel/7/packages-microsoft-prod.rpm

followed by

yum install blobfuse

Now continuing with the instructions i create a temporary location that it requires

mkdir /mnt/resource/blobfusetmp
chown oracle:dba /mnt/resource/blobfusetmp


Then i create a config file which contains the details and secret key for my azure storage i created earlier

vi /etc/azstorconfig.cfg

this contains

accountName oraclebkptest
accountKey blahblahblah==
containerName backups


Now i create an actual mount point where i want the thing visible

mkdir /fusebackups

So now we're ready to mount - so lets try it

blobfuse /fusebackups --tmp-path=/mnt/resource/blobfusetmp -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --config-file=/etc/azstorconfig.cfg
Failed to connect to the storage container. There might be something wrong about the storage config, please double check the storage account name, account key and container name. errno = 1600



And we fail miserably - with not much to go on - thankfully pretty much the only thing i found solves the problem - I am forcing https in my storage setup but the default is http - so i just need to add the flag --use-https=true

blobfuse /fusebackups --tmp-path=/mnt/resource/blobfusetmp -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --config-file=/etc/azstorconfig.cfg --use-https=true

And that runs without error - and indeed i can see it in df-k output as this

blobfuse                              32895696   2146344  29055304   7% /fusebackups

The permissions look a little odd but it should be useable

 ls -ld /fusebackups
drwxrwx---. 2 32602 bin 4096 Aug 29  4439189 /fusebackups


however when i come to switch to oracle and take a look i see the following.....


d??????????   ? ?      ?       ?            ? fusebackups


Now thats a new one on me and i was a little baffled - but a quick look at the man pages for fuse and some guesswork i added the following mount options to the mount command and tried again

blobfuse /fusebackups --tmp-path=/mnt/resource/blobfusetmp -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --config-file=/etc/azstorconfig.cfg --use-https=true -o uid=1001 -o gid=1002 -o allow_other


After this as the oracle user i see this - so looking good

drwxrwx---.   2 oracle dba  4096 Jan  1  1970 fusebackups

So we're ready for a backup now i think - so lets try it.......

RMAN> backup format '/fusebackups/%U' database;

which throws out this

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 12/06/2017 19:09:51
ORA-19504: failed to create file "/fusebackups/5sslfaii_1_1"
ORA-27044: unable to write the header block of file
Linux-x86_64 Error: 1: Operation not permitted
Additional information: 4


hmm - thats not that useful - lets try just manually copying a file to the location

cp file /fusebackups

this runs without error but seems a little slow - in fact timing it for a 20MB file takes 20 seconds.....

time cp file/fusebackups
real    0m20.193s
user    0m0.000s
sys     0m0.006s


hmmmmmm

lets do an ls of whats there

ls
.
.
.
.
hang hang hang

And it never comes back - lets try again

and we get

ls: reading directory .: Machine is not on the network

double hmmmm

lets look in this cache directory that was mentioned right at the start


cd /mnt/resource/blobfusetmp
 find .
.
./root
./root/GOLDDB_hot_lv1_20171205_184_5oslcp91_1_1


And we see the file - but it seems to be in this limbo cache area and nowhere else - indeed the blob storage screen in the portal just shows this complete lack of files also


So at that point i gave up and decided maybe this isn't really production ready.........not sure that the rman interface would even work at all even if the basics of this were working - maybe worth revisiting next year as could be useful for some use cases when its more robust.

Viewing all 259 articles
Browse latest View live