It’s very easy to log the visitor’s IP address in a web application. In PHP, for example, a simple $_SERVER[‘REMOTE_ADDR’] returns the desired value.
But the IP address alone doesn’t tell us much about the person, sometimes it’s not even a person, but a search robot, who just entered the website. More interesting would be, where he (the person or the machine) lives.
I’d like to show you how MySQL 5 can help you to easily query the country from some source data that can be downloaded for free from MaxMind at http://www.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip.
First I’d recommand you to create a new database, if you have the privilege to do so. I have called my database geoips. Download the file mentioned above and extract the file GeoIPCountryWhois.csv from this zip file. The file looks like this:
"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
"3.0.0.0","4.17.135.31","50331648","68257567","US","United States"
"4.17.135.32","4.17.135.63","68257568","68257599","CA","Canada"
"4.17.135.64","4.17.142.255","68257600","68259583","US","United States"
...
First we have to get this data into a new table, which I call ip_ranges. We create it in the geoips database using this CREATE command:
CREATE TABLE ip_ranges (
ip1_temp char(16),
ip2_temp char(16),
ip_from int unsigned NOT NULL PRIMARY KEY,
ip_to int unsigned NOT NULL UNIQUE,
code char(2) NOT NULL,
country varchar(100) NOT NULL,
INDEX (code)) ENGINE = InnoDB;
I didn’t take much care about the ip1_temp and ip2_temp columns, because we only need them temporarily. We will make our searches based on the values in the third and forth column of the data in the csv file. Later we will put the code and country pairs into a separate table to normalize the data. Because we will put a foreign key on the code column, we put an index on it and we use the InnoDB storage engine. This isn’t a must, but it’s a good way to also show you how to accomplish this task ;-).
To import the csv file into the table we just created we have two possibilities, either via a SQL command or via the mysqlimport program that’s included in MySQL. I’ll show you both ways. You need the FILE privilege for both variants.
Open a console window and change to the directory where the GeoIPCountryWhois.csv file resides. Start your mysql client program with the command
mysql -h [host] -u [username] -p[password] geoips
Then execute this SQL statement:
LOAD DATA LOCAL INFILE 'GeoIPCountryWhois.csv'
INTO TABLE ip_ranges
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Instead of starting your mysql client program, you can also use the mysqlimport program. Therefore, the name of the input file must be equal to the table name, so we rename the file GeoIPCountryWhois.csv to ip_ranges.csv.
move GeoIPCountryWhois.csv ip_ranges.csv (in Windows)
mv GeoIPCountryWhois.csv ip_ranges.csv (in Linux)
Then we can use this command (please write it in one line) to import the csv file:
mysqlimport -h [host] -u [username] -p[password] --local
--fields-terminated-by="," --fields-enclosed-by="\""
--lines-terminated-by="\n" geoips ip_ranges.csv
You should now have somthing like 74,000 records in your ip_ranges table.
Change back to your mysql client and the geoips database. We can now remove the columns ip1_temp and ip2_temp from the ip_ranges table, because we won’t need them for searching. If you want to leave them to verify the results, that’s no problem either.
ALTER TABLE ip_ranges
DROP ip1_temp,
DROP ip2_temp;
Now we have an interesting task ahead of us. We want to move a list of country codes and country names into a separate table called ip_countries. That’s very easy to do with only one command.
CREATE TABLE ip_countries
SELECT DISTINCT code, country
FROM ip_ranges
ORDER BY code;
We can now
* add a primary key to the code column of the ip_countries table
* remove the country column from the ip_ranges table
* add a foreign key constraint to the code column of the ip_ranges table, which refers to the code column of the ip_countries table.
We need two more SQL commands to do this:
ALTER TABLE ip_countries ADD PRIMARY KEY (code);
ALTER TABLE ip_ranges
DROP country,
ADD FOREIGN KEY (code)
REFERENCES ip_countries(code);
Now we have all the data we need to start writing a User Defined Function which queries the country based on an IP address that we pass as parameter. Of course we can’t pass the IP address as such a number like we have the values in the ip_from and ip_to columns of the ip_ranges table. So we need to calculate the corresponding number from the IP address.
The page http://www.maxmind.com/app/csv tells us how this value can be calculated. We split up the 4 parts of our IP address and calculate it like this (ip1 = 1st part, ip2 = 2nd part, ip3 = 3rd part, ip4 = 4th part):
value = 16777216 x ip1 + 65536 x ip2 + 256 x ip3 + ip4
I use my current IP address 62.46.14.132 to try this out and get the value 1,043,205,764.
As soon we have this value, we can query the country like this:
SELECT b.country
FROM ip_ranges a INNER JOIN ip_countries b
ON a.code = b.code
WHERE ip_from = 1043205764
The result is Austria, which is the place where I live.
Even though we query the data from two tables with one storing more than 74,000 records, the query only takes about 0.0025 seconds on my machine. That’s because of the indexes we’ve set. Prefixing the keyword EXPLAIN to the last query shows us that MySQL was able to query the data very efficiently.
mysql> EXPLAIN SELECT b.country
-> FROM ip_ranges a INNER JOIN ip_countries b
-> ON a.code = b.code
-> WHERE ip_from ip_to >= 1043205764 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: a
type: range
possible_keys: PRIMARY,ip_to,code
key: PRIMARY
key_len: 4
ref: NULL
rows: 2322
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: b
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 6
ref: geoips.a.code
rows: 1
Extra:
2 rows in set (0.00 sec)
Finally, we write the User Defined Function, which puts this all together:
DELIMITER $$
DROP FUNCTION IF EXISTS `geoips`.`getCountry`$$
CREATE FUNCTION `geoips`.`getCountry` (pIp CHAR(16))
RETURNS VARCHAR(100)
BEGIN
DECLARE _ip1, _ip2, _ip3, _ip4, _ip_value INT UNSIGNED;
DECLARE _country VARCHAR(100);
SELECT substring_index(pIp, '.', 1),
substring_index(pIp, '.', -3),
substring_index(pIp, '.', -2),
substring_index(pIp, '.', -1) INTO _ip1, _ip2, _ip3, _ip4;
SELECT 16777216 * _ip1 + 65536 * _ip2 + 256 * _ip3 + _ip4
INTO _ip_value;
SELECT b.country
FROM ip_ranges a INNER JOIN ip_countries b
ON a.code = b.code
WHERE ip_from = _ip_value
INTO _country;
RETURN _country;
END$$
DELIMITER ;
I used the substring_index function to extract the 4 parts of the IP address. With ‘help substring_index’ we can ask the mysql client to give us an explanation of this function:
mysql> help substring_index;
Name: 'SUBSTRING_INDEX'
Description:
SUBSTRING_INDEX(str,delim,count)
Returns the substring from string str before count
occurrences of the delimiter delim.
If count is positive, everything to the left of the final
delimiter (counting from the left) is returned.
If count is negative, everything to the right of the final
delimiter (counting from the right) is returned.
Examples:
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
-> 'www.mysql'
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
-> 'mysql.com'
Finally, a little test whether the getCountry UDF really works:
mysql> SELECT getCountry('62.46.14.132');
+----------------------------+
| getCountry('62.46.14.132') |
+----------------------------+
| Austria |
+----------------------------+
1 row in set (0.00 sec)
Great, it does – and that very quickly :-)!