AI Glossary
The Definitive Glossary for Understanding Artificial Intelligence (AI)
| Term | Definition |
|---|---|
| Algorithm | A set of rules or steps followed to solve a problem or perform a task. |
| Artificial Intelligence (AI) | Simulation of human intelligence by machines, especially computers. |
| Machine Learning | A subset of AI where machines improve performance through experience. |
| Deep Learning | A type of machine learning using neural networks with many layers. |
| Neural Network | A series of algorithms that mimic the human brain to recognize patterns. |
| Supervised Learning | Machine learning with labeled data for training. |
| Unsupervised Learning | Machine learning with unlabeled data, finding hidden patterns. |
| Reinforcement Learning | Learning by interacting with an environment and receiving rewards or penalties. |
| Natural Language Processing (NLP) | AI that understands and processes human language. |
| Computer Vision | AI that interprets and understands visual information from the world. |
| Data Mining | The process of discovering patterns in large data sets. |
| Big Data | Extremely large data sets analyzed to reveal patterns and trends. |
| Predictive Analytics | Using data, statistical algorithms, and machine learning techniques to predict future outcomes. |
| Classification | Assigning data into predefined categories. |
| Regression | A statistical method for predicting a continuous outcome. |
| Clustering | Grouping data points into clusters based on similarity. |
| Anomaly Detection | Identifying rare items, events, or observations which differ significantly from the majority of the data. |
| Decision Tree | A model used to make decisions based on rules. |
| Random Forest | An ensemble of decision trees used for classification and regression. |
| Support Vector Machine (SVM) | A supervised learning model used for classification and regression analysis. |
| K-Nearest Neighbors (KNN) | A simple algorithm that stores all available cases and classifies new cases based on a similarity measure. |
| Gradient Descent | An optimization algorithm used to minimize the cost function in machine learning models. |
| Overfitting | When a model learns the training data too well, including noise and details, affecting its performance on new data. |
| Underfitting | When a model is too simple and cannot capture the underlying pattern of the data. |
| Cross-Validation | A technique for assessing how the results of a statistical analysis will generalize to an independent data set. |
| Training Data | Data used to train machine learning models. |
| Test Data | Data used to test the trained model's performance. |
| Validation Data | Data used to tune the model's parameters during training. |
| Hyperparameter | Parameters set before the learning process begins, controlling the learning process. |
| Feature Engineering | The process of using domain knowledge to extract features from raw data. |
| Feature Selection | The process of selecting a subset of relevant features for model construction. |
| Dimensionality Reduction | Reducing the number of random variables under consideration by obtaining a set of principal variables. |
| Principal Component Analysis (PCA) | A technique used to emphasize variation and bring out strong patterns in a data set. |
| Linear Regression | A linear approach to modeling the relationship between a dependent variable and one or more independent variables. |
| Logistic Regression | A statistical model that uses a logistic function to model a binary dependent variable. |
| Bias | Error introduced by approximating a real-world problem which may oversimplify the model. |
| Variance | Error introduced by the model's sensitivity to small fluctuations in the training set. |
| Loss Function | A method of evaluating how well a specific algorithm models the given data. |
| Gradient Boosting | A machine learning technique for regression and classification problems, which builds a model in a stage-wise fashion. |
| AdaBoost | A boosting algorithm that combines multiple weak classifiers to create a strong classifier. |
| Bagging | A technique that combines the predictions of multiple machine learning algorithms to produce a more accurate prediction. |
| Ensemble Learning | Using multiple models to improve the performance of a single model. |
| Convolutional Neural Network (CNN) | A deep learning algorithm which can take in an input image, assign importance to various aspects in the image, and differentiate one from the other. |
| Recurrent Neural Network (RNN) | A type of neural network where connections between nodes can create cycles, allowing output from some nodes to affect subsequent input to the same nodes. |
| Long Short-Term Memory (LSTM) | A type of RNN architecture designed to avoid the long-term dependency problem. |
| Autoencoder | A type of artificial neural network used to learn efficient codings of input data. |
| Generative Adversarial Network (GAN) | A class of machine learning frameworks designed by two neural networks competing with each other to generate new data. |
| Transfer Learning | A machine learning method where a model developed for a particular task is reused as the starting point for a model on a second task. |
| Tokenization | Breaking text into smaller pieces, like words or phrases, for analysis. |
| Embedding | A learned representation for text where words that have the same meaning have a similar representation. |
| Bag of Words (BoW) | A representation of text that describes the occurrence of words within a document. |
| Term Frequency-Inverse Document Frequency (TF-IDF) | A statistical measure used to evaluate how important a word is to a document in a collection. |
| Word2Vec | A group of related models that are used to produce word embeddings. |
| Sentence Embedding | A method to represent entire sentences as vectors. |
| Named Entity Recognition (NER) | A process in NLP that locates and classifies named entities in text into predefined categories. |
| Sentiment Analysis | The process of determining the emotional tone behind a series of words. |
| Text Classification | Assigning predefined categories to text. |
| Text Generation | Using machine learning to generate new, similar text based on a given input. |
| Chatbot | A computer program designed to simulate conversation with human users. |
| Speech Recognition | The ability of a machine to identify words and phrases in spoken language and convert them to a machine-readable format. |
| Image Recognition | The process of identifying and detecting an object or feature in a digital image or video. |
| Object Detection | Identifying and locating objects within an image. |
| Image Segmentation | Partitioning a digital image into multiple segments to make it easier to analyze. |
| Generative Model | A model for generating all values for a phenomenon, both observed and unseen. |
| Discriminative Model | A model that differentiates between different kinds of data instances. |
| Markov Decision Process (MDP) | A mathematical process for making a sequence of decisions. |
| Bayesian Network | A graphical model that represents the probabilistic relationships among a set of variables. |
| Hidden Markov Model (HMM) | A statistical model where the system being modeled is assumed to be a Markov process with hidden states. |
| Fuzzy Logic | A form of many-valued logic dealing with approximate, rather than fixed and exact reasoning. |
| Expert System | A computer system that emulates the decision-making ability of a human expert. |
| Heuristic | A problem-solving approach using practical methods for immediate solutions. |
| Cognitive Computing | Technologies that mimic human brain function to perform tasks. |
| Autonomous Systems | Systems capable of performing tasks without human intervention. |
| Robotics | The branch of technology dealing with the design, construction, operation, and application of robots. |
| Internet of Things (IoT) | Interconnected devices that communicate and exchange data over the internet. |
| Edge Computing | Computing that’s done at or near the source of data. |
| Cloud Computing | Delivering computing services over the internet. |
| Quantum Computing | Computing using quantum-mechanical phenomena. |
| Blockchain | A decentralized digital ledger of transactions. |
| Cryptography | The practice of securing communication from third parties. |
| Cybersecurity | Protecting systems, networks, and programs from digital attacks. |
| Data Science | A field that uses scientific methods, processes, algorithms, and systems to extract knowledge from data. |
| Data Engineer | A professional who prepares ‘big data’ for analytical or operational uses. |
| Data Analyst | A professional who collects, processes, and performs statistical analyses of data. |
| Data Visualization | The graphical representation of information and data. |
| Business Intelligence (BI) | Technologies and strategies used by enterprises for data analysis and management. |
| Data Warehousing | The process of constructing and using a data warehouse. |
| ETL (Extract, Transform, Load) | A process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. |
| SQL (Structured Query Language) | A standard programming language for relational database management and data manipulation. |
| NoSQL | A database management system that does not use SQL. |
| Hadoop | An open-source framework for storing and processing big data. |
| Spark | An open-source unified analytics engine for big data processing. |
| Tableau | A data visualization tool. |
| Power BI | A business analytics tool by Microsoft. |
| Python | A high-level programming language used for general-purpose programming. |
| R | A programming language and software environment for statistical computing and graphics. |
| Java | A high-level, class-based, object-oriented programming language. |
| C++ | A general-purpose programming language created as an extension of C. |
| TensorFlow | An open-source machine learning framework developed by Google. |
| PyTorch | An open-source machine learning library developed by Facebook. |
| Keras | An open-source software library that provides a Python interface for neural networks. |
| Scikit-learn | A free software machine learning library for the Python programming language. |
| OpenCV | An open-source computer vision and machine learning software library. |
| API (Application Programming Interface) | A set of functions and procedures allowing the creation of applications that access features or data of an operating system, application, or other service. |