# Python Review: Core Syntax, Pandas, and scikit-learn

This notebook is a quick review of core Python ideas, plus gentle introductions to pandas and scikit-learn.

## Loops

Use `for` loops to iterate over items in a collection, and `while` loops to repeat until a condition becomes false. Use `break` to stop early and `continue` to skip to the next iteration.

In [None]:
# Example: find the first square number greater than 30
squares = []
for n in range(1, 10):
    sq = n * n
    squares.append(sq)
    if sq > 30:
        print('First square > 30:', sq)
        break

# Example: count down with a while loop
count = 3
while count > 0:
    print(count)
    count -= 1
print('Lift off!')


### Practice
1. Implement `sum_even(numbers)` that returns the sum of the even values in a list.
2. Fix the bug in the `while` loop so it stops after printing 5 numbers.
3. Write a loop that builds a list of the cubes of numbers 1 through 5.

In [None]:
def sum_even(numbers):
    # TODO: implement
    total = 0
    # Your code here
    return total

# Buggy while loop: fix it so it prints exactly 5 numbers (0 to 4)
i = 0
while i < 5:
    print(i)
    # TODO: update i

# TODO: build a list of cubes for 1..5
cubes = []
# Your code here
print('cubes:', cubes)


## Conditionals (if/elif/else)

Use conditionals to branch based on boolean expressions. `elif` lets you check multiple cases in order.

In [None]:
def grade_letter(score):
    if score >= 90:
        return 'A'
    elif score >= 80:
        return 'B'
    elif score >= 70:
        return 'C'
    else:
        return 'F'

print(grade_letter(92))
print(grade_letter(76))


### Practice
1. Write a function `sign(n)` that returns `"positive"`, `"negative"`, or `"zero"`.
2. Fix the conditional so it correctly handles values equal to 10.
3. Add an `elif` to handle the case when `age` is between 13 and 17.

In [None]:
def sign(n):
    # TODO: implement
    return ''

def is_big(x):
    if x > 10:
        return True
    else:
        return False

def ticket_type(age):
    if age < 13:
        return 'child'
    # TODO: add teen case
    return 'adult'


## Lists

Lists store ordered collections. Use indexing and slicing, and list methods like `append` and `pop`.

In [None]:
names = ['Ada', 'Grace', 'Linus']
names.append('Edsger')
print(names[0])
print(names[1:3])

last = names.pop()
print('Removed:', last)
print('Now:', names)


### Practice
1. Implement `first_last(items)` that returns a tuple `(first, last)`.
2. Fix the code so it replaces the second item with `"kiwi"`.
3. Build a list of numbers 0 through 9 using a loop.

In [None]:
def first_last(items):
    # TODO: implement
    return None

fruits = ['apple', 'banana', 'pear']
# BUG: wrong index
fruits[2] = 'kiwi'
print(fruits)

nums = []
# TODO: fill nums with 0..9
print(nums)


## Dictionaries

Dictionaries map keys to values. Access values by key, add new entries, and iterate over keys or items.

In [None]:
scores = {'Ada': 95, 'Grace': 88}
scores['Linus'] = 91
print(scores['Ada'])

for name, score in scores.items():
    print(name, '->', score)


### Practice
1. Implement `invert(d)` that swaps keys and values (assume unique values).
2. Fix the lookup so it returns 0 when a key is missing.
3. Build a frequency dictionary for letters in a word.

In [None]:
def invert(d):
    # TODO: implement
    return {}

counts = {'a': 2, 'b': 1}
# BUG: KeyError when key is missing
print(counts['c'])

word = 'banana'
freq = {}
# TODO: fill freq with letter counts
print(freq)


## Function Calls

Functions bundle reusable steps. Use parameters to accept inputs and `return` to send back a result. You can call functions with positional or keyword arguments.

In [None]:
def area_rectangle(width, height):
    return width * height

def greet(name, punctuation='!'):
    return 'Hello, ' + name + punctuation

print(area_rectangle(3, 4))
print(greet('Ada'))
print(greet(name='Grace', punctuation='.'))


### Practice
1. Implement `normalize(scores)` that divides each score by the maximum score and returns a new list.
2. Fix the function so it returns the correct value.
3. Call `distance` with keyword arguments.

In [None]:
def normalize(scores):
    # TODO: implement
    return []

def double(x):
    y = x * 2
    # BUG: missing return

def distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return (dx * dx + dy * dy) ** 0.5

# TODO: call distance using keyword arguments
# Example: distance(x1=0, y1=0, x2=3, y2=4)


## Basic Recursion

A recursive function calls itself. It needs a base case to stop, and a recursive case that moves the problem toward the base case.

In [None]:
def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

print(factorial(5))


### Practice
1. Implement `sum_list_rec(nums)` that returns the sum of a list using recursion.
2. Fix the base case in `countdown` so it stops at 0.
3. Write a recursive `power(base, exp)` that computes `base ** exp` for nonnegative `exp`.

In [None]:
def sum_list_rec(nums):
    # TODO: implement
    return 0

def countdown(n):
    if n < 0:
        return
    print(n)
    countdown(n - 1)

def power(base, exp):
    # TODO: implement
    return 1


## Classes

Classes bundle data and behavior. Use `__init__` to set up instance attributes, and define methods to operate on them. The first parameter of a method is `self`.

In [None]:
class Counter:
    def __init__(self, start=0):
        self.value = start

    def increment(self, amount=1):
        self.value += amount

    def reset(self):
        self.value = 0

c = Counter()
c.increment()
c.increment(3)
print('Counter value:', c.value)


### Practice
1. Add a `decrement` method to `Counter` that subtracts from `value`.
2. Fix the method so it uses `self` correctly.
3. Create a `Book` class with `title`, `author`, and a method `describe()` that returns a string.

In [None]:
class Counter:
    def __init__(self, start=0):
        self.value = start

    def increment(self, amount=1):
        self.value += amount

    # TODO: implement decrement
    def decrement(self, amount=1):
        pass

class Accumulator:
    def __init__(self):
        self.total = 0

    def add(x):
        # BUG: missing self
        self.total += x

# TODO: implement Book class
class Book:
    pass


## Pandas Basics

Pandas helps you work with tables of data (rows and columns).
- A DataFrame is the whole table.
- A Series is one column.

Think of a DataFrame like a spreadsheet:
- Rows are records.
- Columns are fields.

Quick mental model:
- `df["col"]` -> a Series (one column).
- `df[["a", "b"]]` -> a smaller DataFrame.
- `df[df["score"] >= 90]` -> rows that match a condition.

If you see a `KeyError`, check `df.columns` for the exact column names.

We will go step by step:
1. Import pandas.
2. Build a DataFrame from a dictionary.
3. Inspect shape/columns and preview rows.
4. Select columns and rows.
5. Add a new column.
6. Summarize with groupby.

Run the next cell top to bottom and look at the printed output after each step.

In [None]:
import pandas as pd

# Step 1: build a DataFrame from a dictionary
data = {
    'name': ['Ada', 'Grace', 'Linus', 'Edsger', 'Margaret'],
    'course': ['A', 'A', 'B', 'B', 'A'],
    'score': [95, 88, 79, 84, 92],
    'hours': [10, 8, 6, 7, 9],
}

df = pd.DataFrame(data)

# Step 2: preview the table
print('Full DataFrame:')
print(df)

print('\nFirst 2 rows:')
print(df.head(2))

# Step 3: basic info
print('\nShape (rows, cols):', df.shape)
print('Columns:', df.columns.tolist())

# Step 4: select a column (Series)
scores = df['score']
print('\nScores column:')
print(scores)

# Step 5: select multiple columns (DataFrame)
mini = df[['name', 'score']]
print('\nName + score:')
print(mini)

# Step 6: filter rows
high = df[df['score'] >= 90]
print('\nScore >= 90:')
print(high)

# Step 7: add a new column
df['score_pct'] = df['score'] / 100
print('\nWith score_pct column:')
print(df)

# Step 8: groupby to summarize
avg_by_course = df.groupby('course')['score'].mean()
print('\nAverage score by course:')
print(avg_by_course)


### Practice: Pandas

Follow the steps one by one. Replace each TODO with code.
- First run the cell as-is to see the starting table.
- After each TODO, add a `print(...)` so you can check your result.

Hints:
- Multiply columns using `df["a"] * df["b"]`.
- Filter with `df[df["col"] >= value]`.
- Group with `df.groupby("col")["other"].sum()`.
- Sort with `df.sort_values("col", ascending=False)`.


In [None]:
sales = pd.DataFrame({
    'item': ['apple', 'banana', 'apple', 'orange', 'banana'],
    'day': ['Mon', 'Mon', 'Tue', 'Tue', 'Wed'],
    'quantity': [3, 2, 5, 1, 4],
    'price': [1.0, 0.5, 1.0, 1.2, 0.5],
})

# Step 1: add a revenue column (quantity * price)
# TODO

# Step 2: filter rows where revenue >= 3
# TODO

# Step 3: total revenue by item
# TODO

# Step 4: sort totals by revenue descending
# TODO


## scikit-learn Basics

scikit-learn is a machine learning library. We will predict a score from study habits.
We will use linear regression (a straight-line model).

Key terms:
- Features (X): input columns.
- Target (y): the value you want to predict.
- Train/test split: keep some data for evaluation.
- `fit`: the model learns from training data.
- `predict`: the model makes guesses for new data.

What you should see:
- X shape is (10, 2).
- y shape is (10,).
- Train rows 7 and Test rows 3.

Run the next cell top to bottom, and read each printout.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

# Tiny dataset: hours of study and practice problems -> score
study = pd.DataFrame({
    'hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'practice_problems': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
    'score': [50, 55, 60, 65, 70, 75, 80, 85, 90, 95],
})

# Step 1: split features (X) and target (y)
X = study[['hours', 'practice_problems']]
y = study['score']
print('X shape:', X.shape)
print('y shape:', y.shape)

# Step 2: train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)
print('Train rows:', len(X_train), 'Test rows:', len(X_test))

# Step 3: create and fit a model
model = LinearRegression()
model.fit(X_train, y_train)

# Step 4: predict on the test set
preds = model.predict(X_test)
print('Predictions:', preds)
print('Actual:', y_test.tolist())

# Step 5: evaluate
mae = mean_absolute_error(y_test, preds)
print('Mean absolute error:', mae)

# Step 6: predict a new example
new_student = pd.DataFrame({'hours': [7], 'practice_problems': [14]})
print('Prediction for new student:', model.predict(new_student)[0])


### Practice: scikit-learn

Build a simple classifier on the Iris dataset. Follow the steps.

Hints (copy the pattern):
```python
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
preds = model.predict(X_test)
acc = accuracy_score(y_test, preds)
print(acc)
```

If your accuracy is very low, check that you used the correct train/test split and called `fit` before `predict`.


In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

# Step 1: split into train/test
# TODO

# Step 2: create a model (try k=3)
# TODO

# Step 3: fit the model
# TODO

# Step 4: predict on the test set
# TODO

# Step 5: compute accuracy
# TODO
