Columns

Your task here is to drop unnecessary columns and rename the remaining ones for better consistency.

Why Use Consistent Naming Conventions?

In data science projects, consistent naming conventions are crucial for maintaining readable, maintainable code. When working with datasets, especially in collaborative environments, standardized column names help team members quickly understand your data structure without having to decipher inconsistent naming patterns.

What is SCREAMING_SNAKE_CASE?

SCREAMING_SNAKE_CASE is a naming convention where:

  • All letters are UPPERCASE

  • Words are separated by UNDERSCORES (_)

  • No spaces or special characters are used

For example:

  • "first name" becomes "FIRST_NAME"

  • "Annual Revenue ($)" becomes "ANNUAL_REVENUE"

  • "customerID" becomes "CUSTOMER_ID"

Why?

  1. Industry Standard: This convention is commonly used in database systems and data warehouses, making your skills transferable.

  2. Readability: The uppercase nature makes column names stand out in your code, easily distinguishing them from variables and functions.

  3. Consistency: Enforces a uniform style across your dataset, eliminating confusion from mixed conventions.

  4. Clarity in Code: When reading pandas operations like df['CUSTOMER_AGE'], it's immediately clear you're referring to a column name rather than a calculated variable.

To remove a list of columns you can use .drop(). You can get a list of the dataframe's columns using the .columns property, which you can reassign with another list. You can use .rename() with a dictionary of old and new column name mappings (e.g. {"old_name":"NEW_NAME"} ) or just directly reassign the .columns property.

The .columns propety is a pandas.Series object. You can do string operations on it with .str; to convert to uppercase you can use .str.upper().

Last updated