Network automation with nornir
In this blog post I will switch to a coding, but not run anywhere from networking. I’m going to implement some automation
for frequent day to day task.
Tasks automation in networking
Speaking of automation, which became a hot topic in networking in last years, what are the options to get the job done in a programmatic way? Taking a high-level view on it you probably can divide all the tools in two types. First one is programming language libraries. Take you favourite language and you probably find some ready to use libraries to send commands to networking devices. Can’t find one? You always can write it yourself if you are good in programming. But some ground zero in form of SSH client library probably already exist, so you can use it to automate your tasks interacting with device CLI. For example, you can use Netmiko if you know Python or choose good old Expect (probably not so good already). Taking that path you get flexibility. You can update automation library code to introduce some new features, you can switch to another library if you want or you can write your own. And of course knowing programming allow you to solve many other tasks, not even networking ones. What cons? Some people find it hard to learn programming and of course it is very time consuming. Advancing to levels where you can submit merge request to some project is even more harder/longer path.
Second type is configuration management (CM) tools. You probably heard the words: Chef, Puppet, Ansible, Salt. In searching on the web for network automation articles there are high chance that you find some Ansible course. Why it so popular today? Because of used domain specific language (DSL). Yes, you need to learn another language, but it is way easier that
programming. For a beginner it’s look like writing some text in YAML file, but it’s not plain English. Even Python scripts can make more sense reading out loud. And knowing Ansible didn’t make you know Chef or Salt too. But ease of use and big community around it make this tools acquire more and more network engineer’s minds. Nevertheless biggest drawback here is your dependency on what vendors implement. Ansible for example isn’t network automation tool by itself, it is framework that allow you write tasks using some modules and execute them against some devices. Modules provides you with real functionality and if Cisco didn’t made module to check device health you either have limited capabilities or must write your own. And guess in what language modules for Ansible written! So at some point question can arise. Why learn Ansible (or other CM) when you can learn Python (or other programming language)?
I myself started to invest in Python before I met network automation, just because I was interested in programming since school. Back then I learned Pascal, which is not very useful today. Couple of times I tried to grasp C,
but as of now has no success in it or time to do new attempt. Once, my friend suggest me learn Python, and there how I got here.
What is nornir
Returning to available tools and question on viability of learning CM. Probably such a question arose in minds of some great engineers, who previously produced useful Python tools. And they decide not to simply
abandon all that CM gives you, but took best from both worlds. So what is nornir? Nornir is an automation framework developed in Python, which gives you a capability to write operations (playbooks, tasks) and execute them, but doing it in pure Python [1]. You can think of it like writing Ansible playbook in Python. It takes care of some plumbing around that, like inventory, operation execution, results output, debugging and concurrency. It provides you with plugins which allow you from start using Netmiko and NAPALM in your operations, but of course since that’s just Python code you can do whatever you want! Or probably whatever Python allows you.
Task
I decided to write set of nornir operations, which combined will check DC ToR switch for a VRF status. Let me explain a little. I work with some DCs which run CLOS topology. Not diving into details, top of rack (ToR) switches run VRFs with some physical interfaces and SVIs mapped into them. So I want my task to check on a given VRF name:
- what SVIs and physical interfaces mapped into in
- what administrative and operational status of that interfaces
- what IP addresses this interfaces has
- are there IP neighbours (ARP, NDP) or MAC addresses learned
- in what status are BGP sessions and how much prefixes exchanged in both directions
After evaluating all that information, task must give some overall conclusion or score on VRF status on that switch. Since I work mostly with Cisco Nexus and Huawei CE ToRs, operations must support both and must be expandable in future.
In the beginning
So let’s start with creating new project in our project directory with some more directories inside.
mkdir -p -v ~/projects/nornir-vrf/{bindings,operations,inventory,utils,tests} cd ~/projects/nornir-vrf/ touch {tasks,tests,utils}/__init__.py
What is bindings? Depending from where you came from you can know them as play(book)s, top files, declarations or run lists. It is a list of operations associated with some hosts, or in other words list of instructions with list of hosts where to run them. Operation is just one task (also may be called resource or state in some CMs) in that list (but list can consist even from one). And hosts of course goes into inventory. I like explanation and comparison of terms from UNIX and Linux System Administration Handbook [2]. I will use operation and task interchangeably from now on.
Next we need one more directory but do not create it by hand, we need virtual environment. If you don’t know what it is, I highly recommend to learn it [3].
python -m venv virtualenv
You can execute tasks inside virtual environment in two ways. You can activate it or you can call binaries from it directly. Most of the time I prefer second way, I will show both for the first task. Let’s update pip which is
probably outdated. First way:
source virtualenv/bin/activate pip install --upgrade pip
Now, when you execute Python or it’s libs it will be called from virtual environment. To deactivate it do:
deactivate
Second way:
virtualenv/bin/pip install --upgrade pip
From now on I will show commands with activated virtual environment. Next, let’s install packages that we will need in nearest future:
pip install nornir pytest
Inventory
Nornir provides you with basic inventory implementation allowing you to define hosts and groups in YAML, very similar to Ansible. In inventory/hosts.yml I will define four ToR switches, two Cisco and two Huawei located in two different DCs:
--- cisco_dc1: nornir_host: 192.168.20.10 groups: - dc_maybach - cisco_tors cisco_dc2: nornir_host: 192.168.40.10 groups: - dc_bentley - cisco_tors huawei_dc1: nornir_host: 192.168.20.20 groups: - dc_maybach - huawei_tors huawei_dc2: nornir_host: 192.168.40.20 groups: - dc_bentley - huawei_tors
In inventory/groups.yml file I will define five groups:
--- defaults: domain: company.com asn: 64500 dc_maybach: dc_name: maybach dc_bentley: dc_name: bentley tors: role: tor_switch cisco_tors: groups: - tors vendor: cisco lineup: nexus nornir_nos: nxos huawei_tors: groups: - tors vendor: huawei lineup: ce nornir_nos: huawei_vrpv8
defaults will propagate all its attributes to every group and host. Another example of inheritance is tors group, cisco_tors and huawei_tors are members of it. So our huawei_dc1 host is a member of dc_maybach, tors and huawei_tors groups. You can configure password as nornir_password and username as nornir_username here too, but I prefer not to put such a sensitive info here. We will find another way later.
First shot
As our inventory now in place let’s do some basic operations over that hosts. To initialize nornir we must supply it with basic config. It can be done inline (inside our binding) or in separate config file. I will go with second, inside main directory create file config.yml:
--- num_workers: 10 inventory: nornir.plugins.inventory.simple.SimpleInventory SimpleInventory: host_file: "inventory/hosts.yml" group_file: "inventory/groups.yml"
File is very simple, we just assign number of workers, which will provide concurrency. Ten is more than enough for our task, but feel free to experiment with that number, especially if you work with big number of hosts. Next, we choose inventory handler, which is nornir built-in SimpleInventory plugin, and provide it with path to files. With that in place we can write our binding. Put inside something simple just to see how nornir works and get some meaningful results. I will retrieve show version from Nexus switches. bindings/tors_vrf_check.py will looks like:
from nornir.core import InitNornir from nornir.plugins.functions.text import print_result from nornir.plugins.tasks.networking import netmiko_send_command from utils.nornir_utils import nornir_set_credentials nrnr = InitNornir(config_file="config.yml") nornir_set_credentials(nrnr) cisco_tors = nrnr.filter(lineup="nexus") cmd = "show version" result = cisco_tors.run(task=netmiko_send_command, command_string=cmd) print_result(result)
First, we import core class InitNornir that will instantiate main object that will do all the cool stuff (like running operations in concurrent manner). That object makes nornir what it is. Next, we import couple of plugins. Plugins is one of the coolest nornir feature, they provide great modularity so you didn’t stuck with default inventory or have some limited set of actions. Look at plugins list [4], most of functionality provided as plugins! That allow nornir to have NAPALM and Netmiko actions pluggable, for example. We will use print_result to get tasks output and netmiko_send_command to query our hosts. Last import will be our homebrewn function we will see later. Next we instantiate Nonrnir supplying it with our config file. nornir_set_credentials function will manipulate hosts credentials, setting usernames and passwords. Then, we filter our inventory, as of now choosing only Nexus switches. Filtering can be done on different fields, here I choose lineup, which we set in inventory previously. Finally we run our task on choosen set of hosts. Tasks can be as simple as “send command using Netmiko” (like we do here) or it can be some function doing something more complex or grouping number of operations together. And of course print_result will provide us with nice output indicating any errors or device state changes if we do some config. Last missed piece of puzzle is our nornir_utils, let’s see it:
import getpass def nornir_set_credentials(nornir, username=None): if not username: from os import getuid from pwd import getpwuid username = getpwuid(getuid())[0] password = getpass.getpass() for host in nornir.inventory.hosts.values(): host.data["nornir_username"] = username host.data["nornir_password"] = password
I alter a little the idea written by Kirk Byers [5]. If you didn’t follow his mailing list [6] you missed some great stuff, by the way. Only function here will set hosts nornir_username and nornir_password values for every host in inventory. It will accept username as argument or otherwise get it from your system (catuion, it is only compatible with Linux systems). Then it ask you for password. Now we need to alter PYTHONPATH environment variable so Python can lookup our packages and modules from local directory.
export PYTHONPATH=.
And finally we can run our binding!
python bindings/tors_vrf_check.py
If network good and credentials right, you will see the show version output for both switches. Cool! Having some working code is probably a good point to start pushing it into git. You probably know git on some level, but just in case here is great place to start [7]. I will drop .gitignore file inside project directory to not tracking some temporary/unneeded stuff, you can get some useful example from GitHub [8], just ensure that you exclude virtualenv and inventory directories. Now initialize repository and alter some config if you must:
git init git config user.name "FirstName LastName" git config user.email "Email@Some.TLD"
Having that in place project tree now looks like:
. ├── bindings ├── config.yml ├── .git ├── .gitignore ├── inventory ├── nornir.log ├── operations ├── tests ├── utils └── virtualenv
And git status will output:
On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) .gitignore bindings/ config.yml operations/ tests/ utils/ nothing added to commit but untracked files present (use "git add" to track)
So let’s do first commit here.
git add . git commit -m "Initial commit"
First nornir operation and binding works and repository tracked by git, great! Now it’s time to do some real job.
Full binding
If I will explain all development process and every line of code that text probably can grow into big chapter of some book. So let’s cover key points. To continue we first must alter our binding file and link it with real operations. Most of the file will look like that in the end (I swapped some lines with … for brevity reasons):
import json from nornir.core import InitNornir from nornir.core.task import Result from nornir.plugins.functions.text import print_result from utils.nornir_utils import nornir_set_credentials from operations import check_vrf_status, check_interfaces, check_mac_table from app_exception import UnsupportedNOS def check_vrf(task, vrf_name): with open('operations/vendor_vars.json', 'r', encoding='utf-8') as jsonf: vendor_vars = json.load(jsonf) if task.host['nornir_nos'] == 'nxos': task.host['vendor_vars'] = vendor_vars['Cisco Nexus'] nos_name = 'Cisco NX-OS' elif task.host['nornir_nos'] == 'huawei_vrpv8': task.host['vendor_vars'] = vendor_vars['Huawei CE'] nos_name = 'Huawei VRPv8' else: raise UnsupportedNOS('{} is unsupported or bogus'.format( task.host['nornir_nos'])) task.host['vrf_name'] = vrf_name task.run(task=check_vrf_status.find_vrf, name='Check if VRF exists on node') task.run(task=check_vrf_status.get_vrf_interfaces, name='Get VRF interfaces list') task.run(task=check_interfaces.check_interfaces_status, name='Check interfaces status for VRF') task.run(task=check_interfaces.get_interfaces_ip_addresses, name='Gather IP addresses for interfaces in VRF') task.run(task=check_interfaces.get_interfaces_ip_neighbors, name='Gather IP neighbors for interfaces in VRF') task.run(task=check_mac_table.get_interfaces_macs, name='Gather learned MAC for interfaces in VRF') task.run(task=check_vrf_status.check_vrf_bgp_neighbors, name='Get BGP neighbors in VRF and they state') result = 'Host {} running {}, VRF {} status:\n'.format( task.host.name, nos_name, task.host['vrf_name']) oper_up_interfaces = [x for x in task.host[ 'interfaces'] if x.oper_status == 'up'] result += '\t{} interfaces in VRF, {} of them operationally up\n'.format( len(task.host['interfaces']), len(oper_up_interfaces)) ... return Result(task.host, result=result) if __name__ == '__main__': nrnr = InitNornir(config_file='config.yml') nornir_set_credentials(nrnr) vrf_name = input('Enter VRF name > ') result = nrnr.run(task=check_vrf, vrf_name=vrf_name) for host in result: print_result(result[host][0])
I wrapped kick-start code into “name == main” block and add topmost task there. Now, if you run that binding it will ask you for password first (our nornir_set_credentials) and then for VRF name to check. Next, you can see how run method reference a function check_vrf from same file. That function will automatically get task object (nornir.core.task.Task) and in a same manner will run other tasks, becoming umbrella operation. To work
properly (and be filled with nornir magic) you must execute them through task.run (or nrnr.run for topmost one) and not call them directly like simple function. And I don’t need to assign results of that tasks to any variable, because outer assignment (nrnr.run) will eventually grab all of them. And all interesting information about host and it’s VRF status assigned to task.host keys, as we will see soon.
If we going to print results as before, we receive results from every operation ran, which we may want. But here I want to get just overall conclusion on VRF status, so I extract result of topmost operation only for every host and print them. In our topmost operation you can see that after running all the task I baked result string for that topmost operation from info stored in task.host. Total results will be nested like:
|-host1 | | | |-topmost operation | |-second operation | |-third operation | |-host-2 | | | |-topmost operation | |-second operation | |-third operation
Let’s leave that vendor_vars.json alone for now. What you need to know about host object nested inside task object. It’s pretty cool and somewhat complex. Operation will receive task with one host nested inside. Thanks to built-in concurrency, instances of same tasks will work on different hosts simultaneously. So you need not bother yourself with that. You just write your operations working with one host. Host object can act just like dictionary, so you can access every attribute it has through keys, like task.host[‘role’]. It will have inside every attribute assigned directly and through inheritance from groups. Just don’t fall in a trap with task.host.data – it acts like dictionary too, but only contain attributes assigned to host directly! Moreover you can store different values in task.host and pass it that way between tasks, which we use with vrf_name here.
Returning to that vendor_vars.json file, what is that about? To interact with device CLI we must use some show commands, which of course will be different for different vendor. Constants better be declared separately from main code, big strings which can change upon some conditions (per device) better not to be hard-coded. So I put such things in a JSON file. It looks like this:
{ "Cisco Nexus": { "show vrf": "show vrf", "vrf regexp": "\\n{}\\s+\\d{{1,2}}", "show vrf interfaces": "show vrf {} interface", "show interfaces brief": "show interface brief", "show ipv4 interface": "show ip interface {}", "show ipv6 interface": "show ipv6 interface {}", "show ipv4 neighbors interface": "show ip arp {} vrf {}", "show ipv6 neighbors interface": "show ipv6 neighbor {} vrf {}", "show mac table interface": "show mac address-table interface {}", "show mac table vlan": "show mac address-table vlan {}", "show bgp ipv4 vrf neighbors": "show bgp vrf {} ipv4 unicast neighbors", "show bgp ipv6 vrf neighbors": "show bgp vrf {} ipv6 unicast neighbors" }, "Huawei CE": { "show vrf": "display ip vpn-instance", "vrf regexp": "\\s{{2}}{}\\s+\\d+:", "show vrf interfaces": "display ip vpn-instance {} interface", "show interfaces brief": "display interface brief", "show ipv4 interface": "display ip interface {}", "show ipv6 interface": "display ipv6 interface {}", "show ipv4 neighbors interface": "display arp interface {}", "show ipv6 neighbors interface": "display ipv6 neighbors {}", "show mac table interface": "display mac-address interface {}", "show mac table vlan": "display mac-address vlan {}", "show bgp ipv4 vrf neighbors": "display bgp vpnv4 vpn-instance {} peer verbose", "show bgp ipv6 vrf neighbors": "display bgp vpnv6 vpn-instance {} peer verbose" } }
As you can see, most of it is vendor specific show commands with {} inside allowing us to put some variables in with Python string format method. But there also a regexp to match VRF names from output. And both vendor sets has identical keys for convenient fetching. Now if I must change something, I don’t need to mess with code. But I of course track this file in git too.
Last interesting piece in binding file – exception, more precise UnsupportedNOS. Nothing really special about it, apart from fact that it inherited from another custom exception. app_exception.py file contains:
class AppException(Exception): '''Global application exception to inherit other exceptions from.''' pass class UnsupportedNOS(AppException): '''Indicate that method receive NOS type/version that is currently unsupported by it.''' pass
If I introduce more exceptions later I can inherit all of them from AppException, which will allow me to catch up any of my custom exceptions with just except AppException block. Very useful trick.
Operations
Having that covered let’s look on couple operations. For example this is code of get_vrf_interfaces:
def get_vrf_interfaces(task): connection = task.host.get_connection('netmiko') output = connection.send_command( task.host['vendor_vars']['show vrf interfaces'].format( task.host['vrf_name'])) if task.host['nornir_nos'] == 'nxos': if task.host['vrf_name'] not in output: interfaces_list = [] else: interfaces_list = [SwitchInterface( x.split(' ')[0], mode='routed') for x in output.strip().split( '\n')[1:]] elif task.host['nornir_nos'] == 'huawei_vrpv8': if 'Interface Number : 0' in output: interfaces_list = [] else: start_mark = 'Interface list : ' start = output.index(start_mark) interfaces_list = [SwitchInterface( x.strip(' ,'), mode='routed') for x in output[start+len( start_mark):].strip().split('\n')] else: raise UnsupportedNOS( 'task received unsupported NOS - {}'.format( task.host['nornir_nos'])) task.host['interfaces'] = interfaces_list if len(task.host['interfaces']) == 0: return Result(host=task.host, failed=True, result='No interfaces assigned to VRF {}'.format( task.host['vrf_name'])) else: return Result( host=task.host, result='Interfaces bound to VRF {}:\n\t{}'.format( task.host['vrf_name'], '\n\t'.join(
[x.name for x in interfaces.list])))
Let’s go through it. First we get connection to the host. “Get” doesn’t mean initiate here. Connections handled very wise by nornir. You can connect to host by different methods (NAPALM, Netmiko or Paramiko available by default), here I used Netmiko. get_connection method tries to get already built connection to that host from task.host object. If there is no one, it will build new and assign it to object. That way you don’t need to implement any connection control logic and bother yourself with passing connection between tasks.
Next, we scrape output of show vrf interfaces command out of switch. Here we doesn’t care of actual command, that’s handled by our JSON mapping (vendor_vars). And than different logic applies to different NOS (network operating system) to extract interfaces assigned to that VRF. For every interface instance of SwitchInterface object created (which I will not show, you can find them in utils/switch_objects.py) and all of them assigned to task.host as a list. If somehow host with unsupported NOS will slip into that operation, exception will be risen. Note the last part with check of interfaces list length. If there are no interfaces assigned to VRF, task will fail (failed=True in Result instantiation), which by default prevent any other tasks to run on that host. If you output result of this operation, it will be looks like:
---- Get VRF interfaces list ** changed : False -------------------------------- INFO Interfaces bound to VRF Galaxy: Vlan1415 Ethernet1/30 Ethernet1/32
Next, let’s have a look at check_interfaces_status operation. Here it’s code:
def check_interfaces_status(task, interface_list=None): if interface_list: task.host['interfaces'] = [SwitchInterface(x) for x in interface_list] connection = task.host.get_connection('netmiko') result = 'Interfaces status:\n' interfaces_brief_output = connection.send_command(task.host['vendor_vars'][ 'show interfaces brief']) for interface in task.host['interfaces']: if task.host['nornir_nos'] == 'nxos': interface_name = cisco_compact_name(interface.name) else: interface_name = interface.name brief_line_start = interfaces_brief_output.index(interface_name) # 'find' will cover end of output (last line) situations brief_line_end = interfaces_brief_output.find('\n', brief_line_start) brief_line = interfaces_brief_output[brief_line_start:brief_line_end] if task.host['nornir_nos'] == 'nxos': if ' up ' in brief_line: interface.admin_status = 'up' interface.oper_status = 'up' elif 'Administratively down' in brief_line: interface.admin_status = 'down' interface.oper_status = 'down' else: interface.admin_status = 'up' interface.oper_status = 'down' elif task.host['nornir_nos'] == 'huawei_vrpv8': phy_status = re.search(r'{}(\(.+\))?\s+(\*?(down|up))'.format( interface.name), brief_line).group(2) if phy_status == '*down': interface.admin_status = 'down' interface.oper_status = 'down' elif phy_status == 'down': interface.admin_status = 'up' interface.oper_status = 'down' else: interface.admin_status = 'up' interface.oper_status = 'up' else: raise UnsupportedNOS('task received unsupported NOS - {}'.format( task.host['nornir_nos'])) result += 'Interface {} is in {}/{} state\n'.format( interface.name, interface.admin_status, interface.oper_status) return Result(host=task.host, result=result)
Pretty much same thing here. We get a connection, issue show interfaces brief command and parse it’s output according to vendor rules. As a result admin_status and oper_status attributes of SwitchInterface objects assigned. What’s worth mention here is interface_list argument. If we execute this operation after get_vrf_interfaces we will have list of interfaces attached to task.host[‘interfaces’], but if we execute it solely we will not have one. So we
can supply list of string with interface names, and that task will build appropriate interface instances to work with. Also you can see here call to cisco_compact_name function. It’s purpose to shorten interface names, because Nexus does not use full version in show interfaces brief output. Results of that operation will looks like that:
---- Check interfaces status for VRF ** changed : False ------------------------ INFO Interfaces status: Interface Vlan1415 is in up/up state Interface Ethernet1/30 is in up/up state Interface Ethernet1/32 is in up/up state
Finally, overall result of topmost operation will output:
vvvv check_vrf ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO Host cisco_dc1 running Cisco NX-OS, VRF Galaxy status: 8 interfaces in VRF, 8 of them operationally up 0/8 v4/v6 addresses present (except link-locals) 0/1603 v4/v6 neighbors learned on interfaces 182 MAC addresses learned in VRF VLANs 4 BGP neighbors configured, 4 of them in established state, 4 of them sent prefixes VRF looks good!
Testing
As this post already become very long I will not dive deep into testing. You can see actual code for yourself. I done unit testing with 3rd party library – pytest, which I find very cool. To ease the testing task I use a fixture (a function that will run once and let any test case use it’s result, founded in tests/conftest.py) to get vendor_vars.json contents and a function which build fake task instances (tests/helpers.py). create_fake_task function uses Mock object to provide CLI outputs without connecting to real network nodes. It’s just read files from tests/cmd_outputs. You can run next command to see if tests pass:
python -m pytest
To do
There’s always room for improvement. One of them will be of course to support more platforms and code versions. And if I don’t want to operations code become ugly huge it’s probably better to split vendor specific logic in separate functions. As from operations point of view I would like to add getting LLDP neighbours of a switch in future. On the testing side of things it would be great to introduce some functional testing with NOS instantiated as a VM and operations runs against it. Unfortunately it’s kinda difficult task with NX-OS and almost impossible with VRPv8. Of course there is a possibility to do this with a physical equipment in the lab. And hey! It’s already a great base to produce other bindings and operations! You can find all final code on GitHub [9].
Conclusion
It looks like nornir provides some viable alternatives to CM systems, especially for thus of us who knows how to write code. And watching articles about it popping up all over the Internet I believe it can become quite popular in a near future. I wish the nornir team good luck in they endevours and start thinking on what tasks to automate next using it.
Links
[1] nornir documentation
[2] UNIX and Linux System Administration Handbook
[3] virtualenv documentation
[4] nornir plugins documentation
[5] nornir_utilities on ktbyers GitHub
[6] Python for network engineers
[7] git book
[8] Python.gitignore on GitHub
[9] code repository on GitHub