Saturday 22 September 2012

UK Phone Numbers

I've been wrestling with a list of predominantly UK phone numbers for the last couple of days.  The data has come from a list of all the calls made to and from an organisation in the past year.  In particular, I want to identify the dialling code so that I can identify what geographical areas people are calling from.

First port of call is usually Google - I certainly don't want to write some code if I can just copy something that someone else has already done.  But sadly every entry I could find was about US phone numbers, which are easy peasy - area dialling codes are 3 digit across North America.  How rational!  Here in the UK dialling codes may be three, four, five or six digits long.  So if you have North American phone numbers to sort out, most of this article will be irrelevant.

Let's start with a bit of research - here and here. "In the United Kingdom, area codes are two, three, four, or, rarely, five digits long (after the initial zero). Regions with shorter area codes, typically large cities, permit the allocation of more telephone numbers as the local number portion has more digits. Local customer numbers are four to eight figures long. The total number of digits is ten, but in a very few areas the total may be nine digits (after the initial zero). The "area code" is also referred to as an "STD (code)" (subscriber trunk dialling) or a "dialling code" in the UK."  So, basically, it's a tangle of different standards.  OK, so I need some code which checks the first few digits and identifies which one is which. 


Data Cleansing
But before we get on to that, let's do some data cleansing.  I have a lot of what look like perfectly good phone numbers with a 92 prefix.  Like most firms, it's 9 to dial out, so I don't know where 92 comes from.  It's easy to deal with though- just a straightforward REPLACE() function.

replace(LEFT(strddi,2), '92', '')
means when the leftmost two characters of the DDI string are 92, replace them with a blank.

-- identify Direct Dialled numbers with 92 prefix - apparent error
update  [CUSTOMER].[PhoneCalls]
Set    strddi = replace(LEFT(strddi,2), '92', '')
WHERE LEFT(strddi,2) = '92'
AND         strDDIAreaCode IS NULL
AND         LEN(strddi) > 5;

There's another issue with incoming mobile phone numbers - in the UK, these start with 07, but somehow they have been recorded as starting with 7.  Replace does the job.

-- identify mobile numbers omitting 0 prefix - apparent error
-- change to UK standard 0
update  [CUSTOMER].[PhoneCalls]
Set    strcli = replace(LEFT(strCLI,1), '7', '07')
WHERE LEFT(strCLI,1) = '7'
AND         strCLIAreaCode IS NULL
AND         LEN(strCLI) = 10;


Then we have the incoming calls from UK numbers, but specifying the full international dialling code 0044.  Fine if you are based somewhere else, but really not necessary if you are in the UK and your caller is in the UK.

-- identify UK numbers with an international code 0044
-- change to UK standard 0
update  [CUSTOMER].[PhoneCalls]
Set    strcli = replace(LEFT(strcli,4), '0044', '0')
WHERE LEFT(strcli,4) = '0044'
AND         strCLIAreaCode IS NULL
AND         LEN(strcli) > 5;



Having dealt with UK codes that think they are worldwide, I can now go on and identify the legitimate international codes.  I could pick out the French calls (0033), German calls (0049) and so on, but I only have a very small percentage of non-UK calls so I'm just going to treat all these Johnny Foreigners the same and lump them together as "International" calls.  If you do want to be more accurate with your international calls, click here for a list of the codes.

-- identify  international codes 00
update  [CUSTOMER].[PhoneCalls]
set   strCLIAreaCode = 'International'
WHERE LEFT(strcli,2) = '00'
AND         strCLIAreaCode IS NULL
AND         LEN(strCLI) > 5;



And use the same technique to identify Mobile numbers:

-- identify mobile phone numbers 07
update  [CUSTOMER].[PhoneCalls]
set   strCLIAreaCode = 'Mobile'
WHERE LEFT(strcli,2) = '07'
AND         strCLIAreaCode IS NULL
AND         LEN(strCLI) > 5;


Freefone numbers are free only to people calling from Landlines, although I understand that there are plans to make them free to people using mobile  phones too.

-- identify Freefone  phone numbers 0800 etc
update  [CUSTOMER].[PhoneCalls]
set   strCLIAreaCode = 'Freefone'
WHERE LEFT(strcli,4) in ('0800', '0500', '0808')
AND         strCLIAreaCode IS NULL
AND         LEN(strCLI) > 5;
 

There is a string of special rate numbers beginning with 08 - once upon a time you could identify 0845 as local rate and 0870 as national rate, but the list has proliferated and now you can't really tell how much it is going to cost.   

-- identify special rate phone numbers 0845 0870 etc
update  [CUSTOMER].[PhoneCalls]
set   strCLIAreaCode = 'Special rate'
WHERE LEFT(strcli,2) = '08'
AND         strCLIAreaCode IS NULL
AND         LEN(strCLI) > 5;


One thing you can be sure of - a call to an 09 number is going to be outrageously expensive...


-- identify premium rate phone numbers 09
update  [CUSTOMER].[PhoneCalls]
set   strCLIAreaCode = 'Premium rate'
WHERE LEFT(strcli,2) = '09'
AND         strCLIAreaCode IS NULL
AND         LEN(strCLI) > 5;


Identifying UK Dialling Codes
So - on to the main point of this article.  UK dialling codes - area codes - whatever you want to call them, allow you to identify where in the country a caller is based.  Aberdeen is 01224, York is 01904.  But very often the dialling code and the subscriber number are held in a single field e.g. 01169158424.
Taking this example, it could in theory be divided thus:
011        6915 8424
0116      915 8424
01169    158424
011691  58424

A dialling code can be anything from 3 digits to 6 digits - a subscriber number can be anything from 4 to 8 digits.

So here's how to tackle the problem.  There are only five cities with three digit dialling codes, ten with four digits, and 12 with six - the rest have five.  So thanks to a bit of research in Wikipedia (see the links I posted earlier) I was able to construct the following CASE statement:


-- UK Dialling codes may have 3, 4 , 5 or 6 digits
-- pick appropriate code
--CLI
 update  [CUSTOMER].[PhoneCalls]
set strCLIAreaCode =
      (
            CASE
            -- 3 digit dialling codes e.g. London, Belfast
            WHEN LEFT(STRcLI,3) IN 
                 ('020', '023', '024', '028', '029' )
                  THEN LEFT(strcli,3)
                 
            -- 4 digit dialling codes e.g. Bristol, Leicester
            WHEN LEFT(STRcLI,4) IN 
                 ('0118', '0117', '0116', '0115', '0114', '0113',
                  '0121', '0131', '0141', '0151', '0161', '0191')
                  THEN LEFT(strcli,4)
                 
            -- 6 digit dialling codes e.g. Langholm, Keswick
            WHEN LEFT(STRcLI,6) IN 
                 ('013873', '015242', '015394', '015395',
                  '015396', '016973', '016974', '016977',
                  '017683', '017684', '017687', '019467' )
                  THEN LEFT(strcli,6)
                                   
            -- The remaining majority of codes are 5 digit
            ELSE  LEFT(strcli,5)
      END
      )

WHERE
       strcli <> 'WITHHELD'
       AND LEN(strcli) > 5
      AND strCLIAreaCode IS null;
       


Looking at 01169158424, it's easy to see that the first four digits match the four digit option, so the area code part of this number is 0116 - which represents Leicester

Finally, a bit of tidying up:
      
-- Remove any remaining nulls in area code fields
-- nulls remaining represent internal codes or unidentifiable
-- CLI
update  [CUSTOMER].[PhoneCalls]
set   strCLIAreaCode =  'N/A'
WHERE strCLIAreaCode IS NULL;


Incidentally, the code shown here is half the code I wrote - I had to do the job for both inbound (CLI - Caller Line Identification) numbers and outbound (DDI - Direct Dial In).  the code is essentially the same so I haven't troubled you with it. 

I hope you find this useful!


No comments:

Post a Comment