Introduction to Python (A crash course)¶

For those of you that know python, this aims to refresh your memory. For those of you that don't know python -- but do know programming -- this class aims to give you an idea how python is similar/different with your favorite programming language.

Printing and reading¶

Printing and reading from standard IO

#  Writing to standard out:

print ("Hello World!")

Hello World!

#  Reading from standard input and output to standard output

name = input("What is your name? ")
print("Hello ",end = '')
print(name)

What is your name? Panayiotis
Hello Panayiotis

Data types¶

Basic data types:

Strings
Integers
Floats
Booleans

These are all objects in Python.

#String
a = "apple"
type(a)
#print type(a)

str

#Integer 
b = 3
type(b)
#print type(b)

int

#Float  
c = 3.2
type(c)
#print type(c)

float

#Boolean
d = True
type(d)
#print type(d)

bool

a+b

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-f1d53b280433> in <module>()
----> 1 a+b

TypeError: Can't convert 'int' object to str implicitly

a+str(b)

'apple3'

Python doesn't require explicitly declared variable types like Java and other languages.

In Python 3, division always yiedls a real number.

14/b

4.666666666666667

14/c

4.375

If you want integer division use the // operator

14//b

4

Strings¶

String manipulation will be very important for many of the tasks we will do. Therefore let us play around a bit with strings.

#Concatenating strings

a = "Hello"  # String
b = " World" # Another string
print (a + b)  # Concatenation

Hello World

# Slicing strings

a = "World"

print (a[0])
print (a[-1])
print ("World"[0:4])
print (a[::-1])

W
d
Worl
dlroW

# Popular string functions
a = "Hello World"
print ("-".join(a))
print (a.startswith("Wo"))
print (a.endswith("rld"))
print (a.replace("o","0").replace("d","[)").replace("l","1"))
print (a.split())
print (a.split('o'))

H-e-l-l-o- -W-o-r-l-d
False
True
He110 W0r1[)
['Hello', 'World']
['Hell', ' W', 'rld']

Strings are an example of an imutable data type. Once you instantiate a string you cannot change any characters in it's set.

string = "string"
string[-1] = "y"  #Here we attempt to assign the last character in the string to "y"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-b377f6c05723> in <module>()
      1 string = "string"
----> 2 string[-1] = "y"  #Here we attempt to assign the last character in the string to "y"

TypeError: 'str' object does not support item assignment

Whitespace in Python¶

Python uses indents and whitespace to group statements together. To write a short loop in Java, you might use:

for (int i = 0; i < 5; i++){
       System.out.println("Hi!");
 }

Python does not use curly braces like C, so the same program as above is written in Python as follows:

for i in range(5): 
    print("Hi \n")

If you have nested for-loops, there is a further indent for the inner loop.

for i in range(3):
    for j in range(3):
        print (i, j)
    
    print ("This statement is within the i-loop, but not the j-loop")

0 0
0 1
0 2
This statement is within the i-loop, but not the j-loop
1 0
1 1
1 2
This statement is within the i-loop, but not the j-loop
2 0
2 1
2 2
This statement is within the i-loop, but not the j-loop

Lists, Tuples, Sets and Dictionaries¶

Number and strings alone are not enough! we need data types that can hold multiple values.

Lists:¶

Lists are mutable or able to be altered. Lists are a collection of data and that data can be of differing types.

groceries = []

# Add to list
groceries.append("oranges")  
groceries.append("meat")
groceries.append("asparangus")

# Access by index
print (groceries[2])
print (groceries[0])

# Find number of things in list
print (len(groceries))

# Sort the items in the list
groceries.sort()
print (groceries)

# List Comprehension
veggie = [x for x in groceries if x is not "meat"]
print (veggie)

# Remove from list
groceries.remove("asparangus")
#groceries.pop()
print (groceries)

#The list is mutable
groceries[0] = 2
print (groceries)

asparangus
oranges
3
['asparangus', 'meat', 'oranges']
['asparangus', 'oranges']
['meat', 'oranges']
[2, 'oranges']

groceries.sort()
print (groceries)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-61508be098f7> in <module>()
----> 1 groceries.sort()
      2 print (groceries)

TypeError: unorderable types: str() < int()

L = ['x',1,3,'y']
print(L.pop())
print(L.pop(0))

y
x

# lists are objects
L = [2,5,1,4]
X = L
L.sort()
print (X)
L.append(3)
print(X)
L = sorted(L)
print(L)
print(X)

[1, 2, 4, 5]
[1, 2, 4, 5, 3]
[1, 2, 3, 4, 5]
[1, 2, 4, 5, 3]

#slicing works for lists as for strings
print (L[1:-1])
print (L[2:])
print(L[:-2])
print(L[1:-1:2])

[2, 3, 4]
[3, 4, 5]
[1, 2, 3]
[2, 4]

List Comprehension¶

Recall the mathematical notation:

$$L_1 = \left\{x^2 : x \in \{0\ldots 9\}\right\}$$$$L_2 = \left(1, 2, 4, 8,\ldots, 2^{12}\right)$$$$M = \left\{x \mid x \in L_1 \text{ and } x \text{ is even}\right\}$$

L1 = [x**2 for x in range(10)] # range(n): returns an iterator over the numbers 0,...,n-1
L2 = [2**i for i in range(13)]
L3 = [x for x in L1 if x % 2 == 0]
print (L1)
print (L2) 
print (L3)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]
[0, 4, 16, 36, 64]

[x for x in [x**2 for x in range(10)] if x % 2 == 0]

[0, 4, 16, 36, 64]

Prime numbers with list comprehension

noprimes = [j for i in range(2, 8) for j in range(i*2, 50, i)] 
# range(s,t,i): returns an iterator over numbers in [s,t) incremented by i
print (noprimes)
primes = [x for x in range(2, 50) if x not in noprimes]
print (primes)

[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 10, 15, 20, 25, 30, 35, 40, 45, 12, 18, 24, 30, 36, 42, 48, 14, 21, 28, 35, 42, 49]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

primes = [x for x in range(2, 50) if x not in [j for i in range(2, 8) for j in range(i*2, 50, i) ]]
print (primes)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

words = 'The quick brown fox jumps over the lazy dog'.split()
print(words) 
upper = [w.upper() for w in words]
print(upper)
stuff = [[w.upper(), w.lower(), len(w)] for w in words]
print(stuff)

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
['THE', 'QUICK', 'BROWN', 'FOX', 'JUMPS', 'OVER', 'THE', 'LAZY', 'DOG']
[['THE', 'the', 3], ['QUICK', 'quick', 5], ['BROWN', 'brown', 5], ['FOX', 'fox', 3], ['JUMPS', 'jumps', 5], ['OVER', 'over', 4], ['THE', 'the', 3], ['LAZY', 'lazy', 4], ['DOG', 'dog', 3]]

s = input('Give numbers separated by comma: ')
x = [int(n) for n in s.split(',')]
print(x)

Give numbers separated by comma: 1,2,3,4
[1, 2, 3, 4]

y = s.split(',')
print(y)
print(y[0]+y[1])
print(x[0]+x[1])

['1', '2', '3', '4']
12
3

#create a vector of all 10 zeros
z = [0 for i in range(10)]
print(z)
#create a 10x10 matrix with all 0s
M = [[0 for i in range(10)] for j in range(10)]
#set the diagonal to 1
for i in range(10): M[i][i] = 1
print(M)
#create a list of random integers in [0,99]
import random
R = [random.choice(range(100)) for i in range(10)]
print(R)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]
[69, 93, 69, 2, 26, 93, 90, 89, 5, 4]

[2*x for x in R]

[106, 150, 48, 14, 156, 142, 126, 96, 52, 136]

# Removing elements from a list while you iterate it can lead to problems
L = [1,2,4,5,6,8]
for x in L:
    if x%2 == 0:
        L.remove(x)
print(L)

[1, 4, 5, 8]

#Another way to do this:
L = [1,2,4,5,6,8]
L = [x for x in L if x%2 == 1] #creates a new list
L[:] = [x for x in L if x%2 == 1]
print(L)

[1, 5]

L = [1,2,4,5,6,8]
R =[y for y in L if y%2 == 0]
for x in R: L.remove(x)
print(L)

[1, 5]

Tuples:¶

Tuples are an immutable type. Like strings, once you create them, you cannot change them. It is their immutability that allows you to use them as keys in dictionaries. However, they are similar to lists in that they are a collection of data and that data can be of differing types.

# Tuple grocery list

groceries = ('orange', 'meat', 'asparangus', 2.5, True)

print (groceries)

#print(groceries[2])

#groceries[2] = 'milk'

L = [1,2,3]
t = tuple(L)
print(t)
L[1] = 5
print(t)
#t[1] = 4

('orange', 'meat', 'asparangus', 2.5, True)
(1, 2, 3)
(1, 2, 3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-56-ead4db6fadf4> in <module>()
     14 L[1] = 5
     15 print(t)
---> 16 t[1] = 4

TypeError: 'tuple' object does not support item assignment

Sets:¶

A set is a sequence of items that cannot contain duplicates. They handle operations like sets in mathematics.

numbers = range(10)
evens = [2, 4, 6, 8]

evens = set(evens)
numbers = set(numbers)

# Use difference to find the odds
odds = numbers - evens

print (odds)

# Note: Set also allows for use of union (|), and intersection (&)

a = [2,1,2,1]
print (a)
a = set(a)
print(a)

[2, 1, 2, 1]
{1, 2}

Dictionaries:¶

A dictionary is a map of keys to values. This is one of the most useful structures. Keys must be unique and immutable.

# A simple dictionary

simple_dict = {'cse012': 'data mining'}

# Access by key
print (simple_dict['cse012'])

data mining

# A longer dictionary
classes = {
    'cse012': 'data mining',
    'cse205': 'object oriented programming'
}

# Check if item is in dictionary
print ('cse012' in classes)

# Add new item
classes['L14'] = 'social networks'
print (classes['L14'])

# Print just the keys
print (list(classes.keys()))

# Print just the values
print (list(classes.values()))

# Print the items in the dictionary
print (list(classes.items()))

# Print dictionary pairs another way
for key, value in classes.items():
    print (key, value)

True
social networks
['cse205', 'L14', 'cse012']
['object oriented programming', 'social networks', 'data mining']
[('cse205', 'object oriented programming'), ('L14', 'social networks'), ('cse012', 'data mining')]
cse205 object oriented programming
L14 social networks
cse012 data mining

for key in classes:
    print (key, classes[key])

cse205 object oriented programming
L14 social networks
cse012 data mining

#change values in a dictionary
classes['L14'] = 'graduate social networks'
print (classes['L14'])

graduate social networks

# Complex Data structures
# Dictionaries inside a dictionary!

professors = {
    "prof1": {
        "name": "Panayiotis Tsaparas",
        "department": "Computer Science",
        "research interests": ["algorithms", "data mining", "machine learning",]
    },
    "prof2": {
        "name": "Yanis Varoufakis",
        "department": "Economics",
        "interests": ["debt", "game theory", "parallel currency",],
    }
}

for prof in professors:
    print (professors[prof]["name"])

Yanis Varoufakis
Panayiotis Tsaparas

professors['prof2']['interests'][1]

'game theory'

Depending on the task that we want to perform, it makes a bid difference in efficiency what structure we use. When searching over a structure it is important to use a Set or a Dictionary structure since search is done in constant time in expectation (or O(logn) worst case). This makes a huge difference when dealing with large datasets.

# The importance of using the right structure:
import random
L = [random.choice(range(1000000)) for i in range(1000)]
import time
t = time.clock()
count = 0;
for x in range(1000000):
    if x in L:
        count += 1
print (time.clock() - t)

27.05953258704466

S = set(L)
t = time.clock()
count = 0;
for x in range(1000000):
    if x in S:
        count += 1
print (time.clock() - t)

0.16005781903510297

#What structure should we use for storing the edges of a graph with millions of edges?
L = [[1,2],[1,4],[2,5],[3,7],[5,7],[6,8]] # etc...
S = set([tuple(x) for x in L])
print(S)

#What if we have also a weight on the edges, and we want to get the weight?
L = [[1,2,0.2],[1,4,0.5],[2,5,0.8],[3,7,0.1],[5,7,0.4],[6,8,0.9]] 
D = {}
for x in L:
    D[tuple(x[0:2])] = float(x[2])
print(D)

#What if we want to be able to get the neighbors of each node and the weight of the edge?
D = {}
for x in L:
    v1,v2,w = x
    if v1 in D:
        D[v1][v2] = w
    else:
        D[v1] = {}
        D[v1][v2] = w
print(D)
print(D[1][2])

{(1, 2), (6, 8), (5, 7), (1, 4), (3, 7), (2, 5)}
{(1, 2): 0.2, (6, 8): 0.9, (3, 7): 0.1, (2, 5): 0.8, (5, 7): 0.4, (1, 4): 0.5}
{1: {2: 0.2, 4: 0.5}, 2: {5: 0.8}, 3: {7: 0.1}, 5: {7: 0.4}, 6: {8: 0.9}}
0.2

Iterators and Generators¶

We can loop over the elements of a list using for

for i in [1,2,3,4]:
    print (i)

# also:
X = [1,2,3,4]
for i in range(len(X)):
    print(X[i])

#iterating two lists in lockstep:
Y = [2,3,4,5]
for x,y in zip(X,Y):
    print(x+y)

When we use for for dictionaries it loops over the keys of the dictionary

for k in {'Alice': 'Wonderland', 'Bob': 'Sfougkarakis'}:
    print (k)

Bob
Alice

When we use for for strings it loops over the letters of the string

for l in 'python is magic':
    print (l)

p
y
t
h
o
n
 
i
s
 
m
a
g
i
c

All these are iterable objects

list({'Alice': 'Wonderland', 'Bob': 'Sfougkarakis'})

['Bob', 'Alice']

list('python is magic')

['p', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'm', 'a', 'g', 'i', 'c']

print ('-'.join('panayiotis'))
print ('-'.join(['a','b','c']))

p-a-n-a-y-i-o-t-i-s
a-b-c

Function iter takes as input an iterable object and returns an iterator

Many functions take iterators as inputs

a = [x for x in range(10)]
print (sum(iter(a)))
print (sum(a))

45
45

generators are functions that produce sequences of results (and not a single value)

def func(n):
    for i in range(n):
        yield i

g = func(10)
print (g)
print (next(g))
print (next(g))
print (next(g))

<generator object func at 0x00000000013E3AF8>
0
1
2

def demonstrate(n):
    print ('begin execution of the function')
    for i in range(n):
        print ('before yield')
        yield i*i
        print ('after yield')

g = demonstrate(3)
print (next(g))
print('exited demo')
print (next(g))
print('exited demo')
#print (next(g))
#print (next(g))

begin execution of the function
before yield
0
exited demo
after yield
before yield
1
exited demo

g = (x for x in range(10))
print (g)
print (sum(g))
y = [x for x in range(10)]
print (y)
print(sum(y))

<generator object <genexpr> at 0x0000000004DECF30>
45
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
45

File I/O¶

# Writing to a file
with open("example1.txt", "w") as f:
    f.write("Hello World! \n")
    f.write("How are you? \n")
    f.write("I'm fine.")

# Writing to a file
f = open("example2.txt", "w")
f.write("Hello World! \n")
f.write("How are you? \n")
f.write("I'm fine.")
f.close()

# Reading from a file
with open("example1.txt", "r") as f:
    data = f.readlines()
    for line in data:
        words = line.split()
        print (words)

['Hello', 'World!']
['How', 'are', 'you?']
["I'm", 'fine.']

# Count lines and words in a file
lines = 0
words = 0
the_file = "example2.txt"

f= open(the_file, 'r')
for line in f:
       lines += 1
       words += len(line.split())
print ("There are %i lines and %i words in the %s file." % (lines, words, the_file))

There are 3 lines and 7 words in the example2.txt file.

Reading and Writing to Standard Input/Output

import sys

for line in sys.stdin:
    sys.stdout.write(line)

Functions¶

def linear(x):
    a = 2
    b = 1
    return a*x+b

print (linear(10))

21

def displayperson(name,age,univ = 'UoI'):
    print ("My name is "+ name +", I am "+age+" years old and I study at " + univ+ ".")
    return
    
displayperson("Bob","20")
displayperson("Alice","21",'NTUA')

My name is Bob, I am 20 years old and I study at UoI.
My name is Alice, I am 21 years old and I study at NTUA.

Lambda functions¶

Python supports the creation of anonymous functions (i.e. functions that are not bound to a name) at runtime, using a construct called "lambda".

def f (x): return x**2
print (f(8))

64

g = lambda x: x**2
print (g(8))

64

The above pieces of code are equivalent to each other! Note that there is no ``return" statement in the lambda function. A lambda function does not need to be assigned to variable, but it can be used within the code wherever a function is expected.

f = lambda x, y : x + y
print (f(2,3))

5

def multiply (n): return lambda x: x*n
 
multiply_by_2 = multiply(2)
g = multiply(6)
print (multiply_by_2)
print (multiply_by_2(10), g(10))

<function multiply.<locals>.<lambda> at 0x0000000004E25840>
20 60

multiply(3)(30)

90

The map() function¶

The advantage of the lambda operator can be seen when it is used in combination with the map() function. map() is a function with two arguments:

r = map(func,s)

func is a function and s is a sequence (e.g., a list). map returns an iterator where we have applied function func to all the elements of s. You need to convert this into a list if you want to use it.

map() and lambda give functionality very similar to that of list comprehension

def dollar2euro(x):
    return 0.89*x
def euro2dollar(x):
    return 1.12*x

amounts= (100, 200, 300, 400)
dollars = map(dollar2euro, amounts)
print (dollars)
print (list(dollars))

<map object at 0x0000000004E188D0>
[89.0, 178.0, 267.0, 356.0]

amounts= (100, 200, 300, 400)
euros = map(euro2dollar, amounts)
print (list(euros))

[112.00000000000001, 224.00000000000003, 336.00000000000006, 448.00000000000006]

list(map(lambda x: 0.89*x, amounts))

[89.0, 178.0, 267.0, 356.0]

map can also be applied to more than one lists as long as they are of the same size and type

a = [1,2,3,4,5]
b = [-1,-2,-3, -4, -5] 
c = [10, 20 , 30, 40, 50]

l1 = list(map(lambda x,y: x+y, a,b))
print (l1)
l2 = list(map (lambda x,y,z: x-y+z, a,b,c))
print (l2)

[0, 0, 0, 0, 0]
[12, 24, 36, 48, 60]

words = 'The quick brown fox jumps over the lazy dog'.split()
uwords = list(map(lambda w: [w.upper(), w.lower(), len(w)], words))
for t in uwords:
    print (t)

['THE', 'the', 3]
['QUICK', 'quick', 5]
['BROWN', 'brown', 5]
['FOX', 'fox', 3]
['JUMPS', 'jumps', 5]
['OVER', 'over', 4]
['THE', 'the', 3]
['LAZY', 'lazy', 4]
['DOG', 'dog', 3]

The filter() function¶

The function filter(function, list) filters out all the elements of a list, for which the function function returns True. Returns an iterator

nums = [i for i in range(100)]
print (nums)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

even = list(filter(lambda x: x%2==0 and x!=0, nums))
print (even)

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]

Command line arguments¶

To read the command line arguments use the sys.argv list. The first element of this list is the program name. More sophisiticated processing can be done using the getopt.getopt method

import sys

print ('Number of arguments:', len(sys.argv), 'arguments.')
print ('Argument List:', str(sys.argv))

Number of arguments: 5 arguments.
Argument List: ['C:\\Anaconda3\\lib\\site-packages\\IPython\\kernel\\__main__.py', '-f', 'C:\\Users\\Panayiotis\\.ipython\\profile_default\\security\\kernel-a3b88187-1a4c-44b6-9924-aa3dfaec2c63.json', '--profile-dir', 'C:\\Users\\Panayiotis\\.ipython\\profile_default']

Libraries¶

Python is a high-level open-source language. But the Python world is inhabited by many packages or libraries that provide useful things like array operations, plotting functions, and much more. We can (and we should) import libraries of functions to expand the capabilities of Python in our programs.

There are many python libraries for data mining, and we will use many of these libraries in this course.

Example: The random library¶

import random
print (random.choice(range(10))) # generates a random number in the range [0,9]
myList = [2, 109, False, 10, "data", 482, "mining"]
print(random.choice(myList))

1
mining

from random import shuffle # imports a specific function of random that can be used without the random. prefix
x = [i for i in range(10)]
print (x)
shuffle(x)
print (x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[4, 7, 5, 6, 0, 3, 9, 8, 2, 1]

# some more methods of random
print(random.sample([1,3,4,7,8,9,20],3)) #samples 3 elements uniformly at random
print(random.random()) # a random number in [0,1)
print(random.uniform(-1,1)) #a random number in [-1,1]
print(random.gauss(0,1)) #sample from a gaussian distribution with mean 0 and std 1

[9, 20, 4]
0.8665630576491753
-0.6164878646949343
-0.9080984966255757

Including images¶

from IPython.display import Image
Image(filename = "CSE-UOI-LOGO-EN.jpg",width = 231.4, height = 272.9)
#Image(response.content)