For those of you that know python, this aims to refresh your memory. For those of you that don't know python -- but do know programming -- this class aims to give you an idea how python is similar/different with your favorite programming language.
Printing and reading from standard IO
# Writing to standard out:
print ("Hello World!")
# Reading from standard input and output to standard output
name = input("What is your name? ")
print("Hello ",end = '')
print(name)
Basic data types:
These are all objects in Python.
#String
a = "apple"
type(a)
#print type(a)
#Integer
b = 3
type(b)
#print type(b)
#Float
c = 3.2
type(c)
#print type(c)
#Boolean
d = True
type(d)
#print type(d)
a+b
a+str(b)
Python doesn't require explicitly declared variable types like Java and other languages.
In Python 3, division always yiedls a real number.
14/b
14/c
If you want integer division use the // operator
14//b
String manipulation will be very important for many of the tasks we will do. Therefore let us play around a bit with strings.
#Concatenating strings
a = "Hello" # String
b = " World" # Another string
print (a + b) # Concatenation
# Slicing strings
a = "World"
print (a[0])
print (a[-1])
print ("World"[0:4])
print (a[::-1])
# Popular string functions
a = "Hello World"
print ("-".join(a))
print (a.startswith("Wo"))
print (a.endswith("rld"))
print (a.replace("o","0").replace("d","[)").replace("l","1"))
print (a.split())
print (a.split('o'))
Strings are an example of an imutable data type. Once you instantiate a string you cannot change any characters in it's set.
string = "string"
string[-1] = "y" #Here we attempt to assign the last character in the string to "y"
Python uses indents and whitespace to group statements together. To write a short loop in Java, you might use:
for (int i = 0; i < 5; i++){
System.out.println("Hi!");
}
Python does not use curly braces like C, so the same program as above is written in Python as follows:
for i in range(5):
print("Hi \n")
If you have nested for-loops, there is a further indent for the inner loop.
for i in range(3):
for j in range(3):
print (i, j)
print ("This statement is within the i-loop, but not the j-loop")
Number and strings alone are not enough! we need data types that can hold multiple values.
Lists are mutable or able to be altered. Lists are a collection of data and that data can be of differing types.
groceries = []
# Add to list
groceries.append("oranges")
groceries.append("meat")
groceries.append("asparangus")
# Access by index
print (groceries[2])
print (groceries[0])
# Find number of things in list
print (len(groceries))
# Sort the items in the list
groceries.sort()
print (groceries)
# List Comprehension
veggie = [x for x in groceries if x is not "meat"]
print (veggie)
# Remove from list
groceries.remove("asparangus")
#groceries.pop()
print (groceries)
#The list is mutable
groceries[0] = 2
print (groceries)
groceries.sort()
print (groceries)
L = ['x',1,3,'y']
print(L.pop())
print(L.pop(0))
# lists are objects
L = [2,5,1,4]
X = L
L.sort()
print (X)
L.append(3)
print(X)
L = sorted(L)
print(L)
print(X)
#slicing works for lists as for strings
print (L[1:-1])
print (L[2:])
print(L[:-2])
print(L[1:-1:2])
Recall the mathematical notation:
$$L_1 = \left\{x^2 : x \in \{0\ldots 9\}\right\}$$$$L_2 = \left(1, 2, 4, 8,\ldots, 2^{12}\right)$$$$M = \left\{x \mid x \in L_1 \text{ and } x \text{ is even}\right\}$$L1 = [x**2 for x in range(10)] # range(n): returns an iterator over the numbers 0,...,n-1
L2 = [2**i for i in range(13)]
L3 = [x for x in L1 if x % 2 == 0]
print (L1)
print (L2)
print (L3)
[x for x in [x**2 for x in range(10)] if x % 2 == 0]
Prime numbers with list comprehension
noprimes = [j for i in range(2, 8) for j in range(i*2, 50, i)]
# range(s,t,i): returns an iterator over numbers in [s,t) incremented by i
print (noprimes)
primes = [x for x in range(2, 50) if x not in noprimes]
print (primes)
primes = [x for x in range(2, 50) if x not in [j for i in range(2, 8) for j in range(i*2, 50, i) ]]
print (primes)
words = 'The quick brown fox jumps over the lazy dog'.split()
print(words)
upper = [w.upper() for w in words]
print(upper)
stuff = [[w.upper(), w.lower(), len(w)] for w in words]
print(stuff)
s = input('Give numbers separated by comma: ')
x = [int(n) for n in s.split(',')]
print(x)
y = s.split(',')
print(y)
print(y[0]+y[1])
print(x[0]+x[1])
#create a vector of all 10 zeros
z = [0 for i in range(10)]
print(z)
#create a 10x10 matrix with all 0s
M = [[0 for i in range(10)] for j in range(10)]
#set the diagonal to 1
for i in range(10): M[i][i] = 1
print(M)
#create a list of random integers in [0,99]
import random
R = [random.choice(range(100)) for i in range(10)]
print(R)
[2*x for x in R]
# Removing elements from a list while you iterate it can lead to problems
L = [1,2,4,5,6,8]
for x in L:
if x%2 == 0:
L.remove(x)
print(L)
#Another way to do this:
L = [1,2,4,5,6,8]
L = [x for x in L if x%2 == 1] #creates a new list
L[:] = [x for x in L if x%2 == 1]
print(L)
L = [1,2,4,5,6,8]
R =[y for y in L if y%2 == 0]
for x in R: L.remove(x)
print(L)
Tuples are an immutable type. Like strings, once you create them, you cannot change them. It is their immutability that allows you to use them as keys in dictionaries. However, they are similar to lists in that they are a collection of data and that data can be of differing types.
# Tuple grocery list
groceries = ('orange', 'meat', 'asparangus', 2.5, True)
print (groceries)
#print(groceries[2])
#groceries[2] = 'milk'
L = [1,2,3]
t = tuple(L)
print(t)
L[1] = 5
print(t)
#t[1] = 4
A set is a sequence of items that cannot contain duplicates. They handle operations like sets in mathematics.
numbers = range(10)
evens = [2, 4, 6, 8]
evens = set(evens)
numbers = set(numbers)
# Use difference to find the odds
odds = numbers - evens
print (odds)
# Note: Set also allows for use of union (|), and intersection (&)
a = [2,1,2,1]
print (a)
a = set(a)
print(a)
A dictionary is a map of keys to values. This is one of the most useful structures. Keys must be unique and immutable.
# A simple dictionary
simple_dict = {'cse012': 'data mining'}
# Access by key
print (simple_dict['cse012'])
# A longer dictionary
classes = {
'cse012': 'data mining',
'cse205': 'object oriented programming'
}
# Check if item is in dictionary
print ('cse012' in classes)
# Add new item
classes['L14'] = 'social networks'
print (classes['L14'])
# Print just the keys
print (list(classes.keys()))
# Print just the values
print (list(classes.values()))
# Print the items in the dictionary
print (list(classes.items()))
# Print dictionary pairs another way
for key, value in classes.items():
print (key, value)
for key in classes:
print (key, classes[key])
#change values in a dictionary
classes['L14'] = 'graduate social networks'
print (classes['L14'])
# Complex Data structures
# Dictionaries inside a dictionary!
professors = {
"prof1": {
"name": "Panayiotis Tsaparas",
"department": "Computer Science",
"research interests": ["algorithms", "data mining", "machine learning",]
},
"prof2": {
"name": "Yanis Varoufakis",
"department": "Economics",
"interests": ["debt", "game theory", "parallel currency",],
}
}
for prof in professors:
print (professors[prof]["name"])
professors['prof2']['interests'][1]
Depending on the task that we want to perform, it makes a bid difference in efficiency what structure we use. When searching over a structure it is important to use a Set or a Dictionary structure since search is done in constant time in expectation (or O(logn) worst case). This makes a huge difference when dealing with large datasets.
# The importance of using the right structure:
import random
L = [random.choice(range(1000000)) for i in range(1000)]
import time
t = time.clock()
count = 0;
for x in range(1000000):
if x in L:
count += 1
print (time.clock() - t)
S = set(L)
t = time.clock()
count = 0;
for x in range(1000000):
if x in S:
count += 1
print (time.clock() - t)
#What structure should we use for storing the edges of a graph with millions of edges?
L = [[1,2],[1,4],[2,5],[3,7],[5,7],[6,8]] # etc...
S = set([tuple(x) for x in L])
print(S)
#What if we have also a weight on the edges, and we want to get the weight?
L = [[1,2,0.2],[1,4,0.5],[2,5,0.8],[3,7,0.1],[5,7,0.4],[6,8,0.9]]
D = {}
for x in L:
D[tuple(x[0:2])] = float(x[2])
print(D)
#What if we want to be able to get the neighbors of each node and the weight of the edge?
D = {}
for x in L:
v1,v2,w = x
if v1 in D:
D[v1][v2] = w
else:
D[v1] = {}
D[v1][v2] = w
print(D)
print(D[1][2])
We can loop over the elements of a list using for
for i in [1,2,3,4]:
print (i)
# also:
X = [1,2,3,4]
for i in range(len(X)):
print(X[i])
#iterating two lists in lockstep:
Y = [2,3,4,5]
for x,y in zip(X,Y):
print(x+y)
When we use for for dictionaries it loops over the keys of the dictionary
for k in {'Alice': 'Wonderland', 'Bob': 'Sfougkarakis'}:
print (k)
When we use for for strings it loops over the letters of the string
for l in 'python is magic':
print (l)
All these are iterable objects
list({'Alice': 'Wonderland', 'Bob': 'Sfougkarakis'})
list('python is magic')
print ('-'.join('panayiotis'))
print ('-'.join(['a','b','c']))
Function iter takes as input an iterable object and returns an iterator
Many functions take iterators as inputs
a = [x for x in range(10)]
print (sum(iter(a)))
print (sum(a))
generators are functions that produce sequences of results (and not a single value)
def func(n):
for i in range(n):
yield i
g = func(10)
print (g)
print (next(g))
print (next(g))
print (next(g))
def demonstrate(n):
print ('begin execution of the function')
for i in range(n):
print ('before yield')
yield i*i
print ('after yield')
g = demonstrate(3)
print (next(g))
print('exited demo')
print (next(g))
print('exited demo')
#print (next(g))
#print (next(g))
g = (x for x in range(10))
print (g)
print (sum(g))
y = [x for x in range(10)]
print (y)
print(sum(y))
# Writing to a file
with open("example1.txt", "w") as f:
f.write("Hello World! \n")
f.write("How are you? \n")
f.write("I'm fine.")
# Writing to a file
f = open("example2.txt", "w")
f.write("Hello World! \n")
f.write("How are you? \n")
f.write("I'm fine.")
f.close()
# Reading from a file
with open("example1.txt", "r") as f:
data = f.readlines()
for line in data:
words = line.split()
print (words)
# Count lines and words in a file
lines = 0
words = 0
the_file = "example2.txt"
f= open(the_file, 'r')
for line in f:
lines += 1
words += len(line.split())
print ("There are %i lines and %i words in the %s file." % (lines, words, the_file))
Reading and Writing to Standard Input/Output
import sys
for line in sys.stdin:
sys.stdout.write(line)
def linear(x):
a = 2
b = 1
return a*x+b
print (linear(10))
def displayperson(name,age,univ = 'UoI'):
print ("My name is "+ name +", I am "+age+" years old and I study at " + univ+ ".")
return
displayperson("Bob","20")
displayperson("Alice","21",'NTUA')
Python supports the creation of anonymous functions (i.e. functions that are not bound to a name) at runtime, using a construct called "lambda".
def f (x): return x**2
print (f(8))
g = lambda x: x**2
print (g(8))
The above pieces of code are equivalent to each other! Note that there is no ``return" statement in the lambda function. A lambda function does not need to be assigned to variable, but it can be used within the code wherever a function is expected.
f = lambda x, y : x + y
print (f(2,3))
def multiply (n): return lambda x: x*n
multiply_by_2 = multiply(2)
g = multiply(6)
print (multiply_by_2)
print (multiply_by_2(10), g(10))
multiply(3)(30)
The advantage of the lambda operator can be seen when it is used in combination with the map() function. map() is a function with two arguments:
r = map(func,s)
func is a function and s is a sequence (e.g., a list). map returns an iterator where we have applied function func to all the elements of s. You need to convert this into a list if you want to use it.
map() and lambda give functionality very similar to that of list comprehension
def dollar2euro(x):
return 0.89*x
def euro2dollar(x):
return 1.12*x
amounts= (100, 200, 300, 400)
dollars = map(dollar2euro, amounts)
print (dollars)
print (list(dollars))
amounts= (100, 200, 300, 400)
euros = map(euro2dollar, amounts)
print (list(euros))
list(map(lambda x: 0.89*x, amounts))
map can also be applied to more than one lists as long as they are of the same size and type
a = [1,2,3,4,5]
b = [-1,-2,-3, -4, -5]
c = [10, 20 , 30, 40, 50]
l1 = list(map(lambda x,y: x+y, a,b))
print (l1)
l2 = list(map (lambda x,y,z: x-y+z, a,b,c))
print (l2)
words = 'The quick brown fox jumps over the lazy dog'.split()
uwords = list(map(lambda w: [w.upper(), w.lower(), len(w)], words))
for t in uwords:
print (t)
The function filter(function, list) filters out all the elements of a list, for which the function function returns True. Returns an iterator
nums = [i for i in range(100)]
print (nums)
even = list(filter(lambda x: x%2==0 and x!=0, nums))
print (even)
To read the command line arguments use the sys.argv list. The first element of this list is the program name. More sophisiticated processing can be done using the getopt.getopt method
import sys
print ('Number of arguments:', len(sys.argv), 'arguments.')
print ('Argument List:', str(sys.argv))
Python is a high-level open-source language. But the Python world is inhabited by many packages or libraries that provide useful things like array operations, plotting functions, and much more. We can (and we should) import libraries of functions to expand the capabilities of Python in our programs.
There are many python libraries for data mining, and we will use many of these libraries in this course.
import random
print (random.choice(range(10))) # generates a random number in the range [0,9]
myList = [2, 109, False, 10, "data", 482, "mining"]
print(random.choice(myList))
from random import shuffle # imports a specific function of random that can be used without the random. prefix
x = [i for i in range(10)]
print (x)
shuffle(x)
print (x)
# some more methods of random
print(random.sample([1,3,4,7,8,9,20],3)) #samples 3 elements uniformly at random
print(random.random()) # a random number in [0,1)
print(random.uniform(-1,1)) #a random number in [-1,1]
print(random.gauss(0,1)) #sample from a gaussian distribution with mean 0 and std 1
from IPython.display import Image
Image(filename = "CSE-UOI-LOGO-EN.jpg",width = 231.4, height = 272.9)
#Image(response.content)